COHERENT MEASUREMENT OF FACTOR RISKS 



Alexander S. Cherny*, Dilip B. Madan** 

* Moscow State University 
Faculty of Mechanics and Mathematics 
Department of Probability Theory 
119992 Moscow, Russia 
E-mai 1 : cherny@me ch . math . msu . su 
Webpage : http : //mech . math . msu . su/ ~ cherny 



** Robert H. Smith School of Business 
Van Munching Hall 
University of Maryland 
College Park, MD 20742 
E-mail : dmadan@rhsmith . umd . edu 
Webpage : http : / / www . rhsmith . umd . edu/ f acuity/ dmadan 

Abstract. We propose a new procedure for the risk measurement of large 
portfolios. It employs the following objects as the building blocks: 

• coherent risk measures introduced by Artzner, Delbaen, Eber, and Heath; 

• factor risk measures introduced in this paper, which assess the risks driven 
by particular factors like the price of oil, S&P500 index, or the credit spread; 

• risk contributions and factor risk contributions, which provide a coherent 
alternative to the sensitivity coefficients. 

We also propose two particular classes of coherent risk measures called Alpha V@R 
and Beta V@R, for which all the objects described above admit an extremely simple 
empirical estimation procedure. This procedure uses no model assumptions on the 
structure of the price evolution. 

Moreover, we consider the problem of the risk management on a firm's level. It 
is shown that if the risk limits are imposed on the risk contributions of the desks to 
the overall risk of the firm (rather than on their outstanding risks) and the desks 
are allowed to trade these limits within a firm, then the desks automatically find 
the globally optimal portfolio. 
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1 Introduction 



1. Overview. One of the basic problems of finance is: How to measure risk in a proper 
way? 

Two most well-known and widely used in practice approaches to this problem are 
variance and V@R. However, both of them have serious drawbacks. Variance penalizes 
high profits in the same way as high losses. Furthermore, the corresponding gain to loss 
ratio S(X) = known as the Sharpe ratio does not have the monotonicity property: 
X < Y does not imply that S(X) < S(Y). In particular, an arbitrage possibility might 
have a very low Sharpe ratio. Concerning V@R, it takes into account only the quantile 
of the distribution without caring about what is happening to the left and to the right 
of the quantile. To put it another way, V@R is concerned only with the probability of 
the loss and does not care about the size of the loss. However, it is obvious that the 
size of the loss should be taken into account. Let us remark in this connection that in 
the study of the default risk the main two characteristics of a default are its probability 
and its severity. Further criticism of variance and V@R can be found in as well as in 
numerous discussions in financial journals. 

Recently, a new very promising approach to measure risk was proposed in the landmark 
papers by Artzner, Delbaen, Eber, and Heath 0, (these are the financial and the 
mathematical versions of the same paper). These authors introduced the notion of a 
coherent risk measure. Since then, the theory of coherent risk measures has rapidly been 
evolving; it already occupies a considerable part of the modern financial mathematics. 
Let us cite the papers 0, 0, 0, 0, 03, 03, El) EH) EH, 03, EE EH, EE EE 
jHI] , to mention only a few. Nice reviews on the theory of coherent risk measures are given 
in [201, ESI Ch. 4], Much of the current research in this area deals with defining 

properly a dynamic risk measure (let us mention in this connection the papers 01], |23| . 
|35j . 03) El)- Currently, more and more research in the theory of coherent risk measures 
is related to applications to problems of finance rather than to the study of "pure" risk 
measures. In particular, the problem of capital allocation was considered in 0, |14j . 
|2U] . Ell) EE EE EH) EI]; the problem of pricing and hedging was investigated in 0, 
00, HE d) HE 03, 03, EE EE EH], El) EE EH; the problem of the optimal 
portfolio choice was studied in 03, Ell) EE the equilibrium problem was considered 
in 0, 0, 03, EE EE El- This list is very far from being complete; for example, on 
the Gloria Mundi web page over two hundred papers are related to coherent risk measures. 
The investigations mentioned above show that the whole finance can be built based on 
coherent risk measures. This is not surprising because risk ( « uncertainty) is at the very 
basis of finance, and a new way of measuring risk yields new approaches to all the basic 
problems of finance. In some sources, the theory of coherent risk measures and their 
applications is already called the "third revolution in finance" (sec |60j). 

2. Coherent risk. A coherent risk measure is a functional defined on the space of 
random variables that has the following form 

p{X) = - inf E Q X (1.1) 

Qe£> 

Here X means the P&L produced by some portfolio over the unit time period (for exam- 
ple, one day) and V is a set of probability measures (all the measures from V are assumed 
to be absolutely continuous with respect to the original measure P). From the financial 
point of view, p(X) is the risk of the portfolio. Formula (jl.lj) has a straightforward eco- 
nomic interpretation: we have a family T> of probabilistic scenarios; under each scenario 
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we calculate the average P&L; then we take the worst case. Let us remark that typically 
a coherent risk measure is defined as a functional on random variables satisfying certain 
properties, and the representation theorem states that it should have the form (jl.lj) . 

The notion of a coherent risk measure is very convenient theoretically, but when ap- 
plying it to practice the following problem arises immediately: How to estimate p{X) 
empirically? Of course, representation cannot be used for the practical calculation; 
furthermore, the class of coherent risk measures (= the class of different sets T>) is very 
wide. Thus, for the practical purposes one needs to select a subclass of coherent risk mea- 
sures that admits an easy estimation procedure. One of the best subclasses known so far 
is Tail V@R. Tail V@R of order A G (0, 1] is the coherent risk measure p\ corresponding 
to V = {Q : dQ/dP < X- 1 }. It is easy to check that p x (X) = -E(X\X < q x (X)), where 
qx (X) is the A-quantile of X . Thus, p\ is a very simple functional. However, the sceptics 
can propose the following argument: it is already hard to estimate q\(X) due to rare tail 
data and it is much harder to estimate the tail mean, so the empirical estimation of p\ is 
problematic. 

In this paper, we propose a one-parameter family of coherent risk measures, which are 
extremely easy to estimate. This is the class of risk measures of the form 

p a (X) = -E min X h 

i=l,...,a 

where a is a fixed natural number and Xi, . . . ,X a are independent copies of X . The 
parameter a controls the risk aversion of p a : the higher is a , the more risk averse is p a . 
We call p a Alpha V@R. This family of risk measures is a very good substitute for Tail 
V@R. It has the following advantages: 

• p a is very intuitive; 

• p a depends on the whole distribution of X and not just on the tail as p\ ; 

• it is very easy to estimate p a {X) from the time series for X . 

Let us remark that for large data sets the empirical estimation of p a from time series is 
faster than the empirical estimation of V@Ra (!) because it does not require the ordering 
of the time series. We believe that p a is the best one-parameter family of coherent risk 
measures. 

Alpha V@R is a subclass of the two-parameter family of coherent risk measures, which 
we call Beta V@R. This is the class of risk measures of the form 

P 

Pa ,p(X) = -E[j2x {i) , 

i=i 

where (3 < a are fixed natural numbers and X^ , . . . , X^ a ) are the order statistics obtained 
from the sequence Xi, . . . ,X a of independent copies of X . In particular, p a ,i = p a - We 
believe that p a ,/3 is the best two-parameter family of coherent risk measures. 

3. Factor risk. The theory of coherent risk measures is already rather rich. However, 
so far, an important issue has been absent: How to measure the separate risks of a portfolio 
induced by factors like the price of oil, S&P 500 index, or the credit spread? Measuring 
these risks separately is very important. Suppose, for example, that a significant change 
in the price of oil is expected in the near future. Then having a big exposure to the price 
of oil is dangerous. 

Of course, one might argue at this point that p(X) depends on LawX, and this 
distribution already takes into account high risk induced by possible oil price moves (as 
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well as all the other risks), so that there is no need to consider separate risks. Our reply to 
this possible criticism is as follows. Suppose that we are trying to assess empirically the 
risk of a large portfolio. As different assets in a large portfolio have different durations (like 
options or bonds), a joint time series for all of them does not exist. So, for the empirical 
estimation we should consider the main factors driving the risk, express the values of all 
the assets through these factors, and use time series for the factors. Another advantage 
of factor risks is as follows. When choosing data for the empirical risk estimation, there 
is always a conflict between accuracy and flexibility to the recent changes (the more data 
we take, the more accurate are estimates, but less is the flexibility to recent changes). 
However, if we are looking for the separate risks driven by separate factors, then this 
conflict can be resolved. Namely, for a factor one can take time series whose time step is 
stretched according to the current volatility of the factor (for details, see Section EI). This 
time change procedure enables one to use arbitrarily large data sets and at the same time 
immediately react to the volatility changes. 

We propose a way for the coherent measurement of factor risks. Let X be the P&L of a 
portfolio produced over a unit time period and Y be the increment of some factor /factors 
over this period (Y might be multidimensional). The factor risk of X induced by Y is 
defined as 

p f {X-Y) = - inf E Q X, 

QeE(D|Y) 

where T> is the set standing in (jl.lj) and 

E(V\Y) := {E(Z\Y) :ZeV}. 

Here we identify measures from T> with their densities with respect to P. Thus, pf ( • ;Y) 
is again a coherent risk measure. 
It is easy to check that 

pf(X;Y)=p(E(X\Y)). 

This expression clarifies the essence of factor risk: pf(X; Y) takes into account the risk 
of X driven only by Y and cuts off all the other risks. The value pf(X;Y) might be 
looked at as the coherent counterpart of the sensitivity of X to Y . However, an important 
difference from the sensitivity analysis is the non-linearity of pf(X; Y) . 
The notion of factor risk can be employed in two forms: 

1. We take one factor (i.e. Y is one-dimensional), and then pf measures the risk 
induced by this factor. 

2. We take all the main factors driving the risk (i.e. Y is multidimensional), and 
then pf measures the "non-diversifiable" risk and serves as a good approximation 
to p. In other words, we are taking the main factors, express the values of all the 
assets in the portfolio through these factors, and look at the coherent risk. This is 
a standard procedure of the risk measurement, the only difference is that coherent 
risk measures are employed. 

A very pleasant feature of pf is that it is easily calculated for large portfolios. Suppose, 
for example, that our portfolio consists of N assets, so that X = X 1 + • • • + X . Then 

N N 

p f (J2 xn -i Y ) = p(Y,f n ( Y ))> 

n=l n=l 

where f n (y) = E(X n \Y — y). In order to calculate this value, we need not know the 
joint distribution of X 1 , . . . ,X N (if the portfolio consists of several thousands assets, 
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finding this distribution is not a very pleasant problem). Instead, we only need the joint 
distributions of (X n ,Y), n = 1, . . . , N. In essence, there is no difference whether our 
portfolio consists of 1000 or 1000000 assets because different f n s are joined simply by the 
summation procedure. 

For coherent factor risks based on Alpha V@R and Beta V@R, we consider an empirical 
estimation procedure similar to the historic V@R estimation (for the description of these 
methods, see |1HJ Sect. 6]). This procedure has the following advantages: 

• arbitrarily large data sets for factors are available; 

• for one-factor risks we can use the time change procedure, which enables one to use 
arbitrarily large data sets and immediately react to the volatility changes; 

• the procedure is simple and works at the same speed as the historic V@R estimation; 

• the use of coherent risk measures is much wiser than the use of V@R; 

• the procedure is completely non-linear; 

• the procedure employs no model assumptions (except, of course, for those used in 
the calculation of E(X|F = y))- 

4. Portfolio optimization. The main message of the first part of the paper is: it is 
reasonable to assess the risk of a position not just as a number, but as a vector of factor 
risks pf( • ; Y 1 ),..., pf( ■ ; Y M ) . We also consider the problem of portfolio optimization 
when the constraints on the portfolio are given as p^(X; Y m ) < c m , m = 1, . . . , M . It 
turns out that this problem admits a simple geometric solution that is similar to the one 
given in [1J2 Subsect. 2.2] for the case of a single constraint p(X) < c. We do not insist 
that this solution is the one that should be implemented in practice, but it gives a nice 
theoretic insight into the form of the optimal portfolio. A possible practical approach to 
this problem is also discussed. 

5. Risk contribution. The functional p measures the outstanding risk of a portfolio. 
However, if a big firm assesses the risk of a trade, it should take into account not the 
outstanding risk of the trade, but rather its impact on the risk of the whole firm. Thus, if 
the P&L of the trade is X and the P&L produced over the same period by the whole firm 
is W , the quantity of interest is p(W + X) — p(W) . If X is small as compared to W, 
then a good approximation to this difference is the risk contribution of X to W. The 
notion of risk contribution based on coherent risk measures was considered in [21], |26j . 
[3%] . [Hi], [HI]. In this paper, we are taking the definition proposed in [HI Subsect. 2.5]. 
One of equivalent definitions of the coherent risk contribution pf(X; W) of X to W is 

p c (X; W) = \mve-\p(W + eX) - p{W)). 

ej.0 

Note that if X is small as compared to W , then 

p(W + X) - p(W) as p c (X; W). 
In most typical situations p c admits the representation 

p c (X;W) = -E Q X, 

where Q is the extreme measure defined as argminQ gI> EqH^ [V> is the set standing 
in (jl.ljO . Note that in this case p* is linear in X . 
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For the most important classes of coherent risk measures, risk contribution admits a 
simple empirical estimation procedure. As shown in the paper, for Alpha V@R, p c has 
the following form: 

p c a {X] W) = — EX argmin Wi , 

i= 1 , . . . ,a 

where (Xi, W\), . . . , (X a , W a ) are independent copies of (X, W) . In particular, it is not 
hard to compute this quantity for the typical case, where the firm's portfolio consists of 
many assets, i.e. W = Yln=i W n with a large number N . A similar simple representation 
is provided for the Beta V@R risk contribution. 

We also provide formulas for the empirical estimation of risk contributions for the 
class of risk measures, which we call Weighted V@R (the term spectral risk measures is 
also used in the literature); this is a wide class containing, in particular, Beta V@R. 

We also study the properties of the coefficient 

which measures the tail correlation between X and W . 

6. Factor risk contribution. In this paper, we introduce the notion of factor risk 
contribution. It is simply the risk contribution applied to the risk measure pf , i.e 

p fc (X; Y; W) = lime -1 ^ (W + eX; Y) - p s (W; Y)). 

The quantity p fc ( -;W;Y) may be viewed as a coherent alternative to the sensitivity 
coefficient. There are, however, two important differences: 

• The sensitivity coefficient is a nice estimate of risk provided that the change in the 
corresponding factor is small (i.e. Y is small). On the other hand, the factor risk 
contribution is a nice estimate of risk provided that X is small as compared to W 
and there are no requirements on the size of Y . 

• The sensitivity coefficient measures the sensitivity of a portfolio to a market factor. 
On the other hand, the factor risk contribution measures the sensitivity of a portfo- 
lio X both to the market factor Y and to the firm's portfolio W . So, p^ c ( • ; Y; W) 
is a "firm-specific" coherent sensitivity to the factor Y. 

As shown in the paper, 

pf%X;Y;W) = p c (E(X \Y);E(W \Y)). 
This formula reduces the computation of pf c to two steps: calculating the functions 

f(y) = H x I y = y) , g(y) = H w \ Y = y) and computing p c . 

7. Risk sharing. The relevance of factor risk contributions becomes clear from 
our considerations of the risk sharing problem. One of the basic problems of the central 
management of a firm consisting of several desks is: How to impose the limits on the 
risk of each desk? Typically, this problem is approached as follows. By looking at the 
performance of each desk, the central management decides which desks should grow and 
which ones should shrink and chooses the risk limits accordingly. Nowadays, the procedure 
of choosing the risk limits is to a large extent a political one. At the same time, the central 
managers would like to have a quantitative rather than political procedure of choosing 
these limits. An idea proposed by practitioners is: Instead of giving each desk a fixed risk 
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limit, it might be reasonable to allow the desks to trade the risk limits between themselves. 
For example, if one desk is not using its risk limit completely, it might sell the excess risk 
limit to another desk, which needs it. 

In this paper we address the following problem: Is it possible to arrange a market of 
risk within a firm in such a way that the resulting competitive optimum would coincide 
with the global one? (By the global optimum we mean the one attained if the central 
management had possessed all the information available to all the desks and had been able 
to solve the corresponding global optimization problem.) The hope of the positive answer 
is justified by a well-known equivalence established in the expected utility framework 
between the global optimum (known also as the Pareto-type or the soviet-type optimum) 
and the competitive optimum (known also as the Arrow-Debreu-type or the western-type 
optimum); see [3U|. A coherent risk counterpart of this result was established in ^SJ 
Sect. 4]. The difference between these results and our setting is that here the objects 
traded are risk limits rather than financial contracts. The results of this paper (they are 
established within the coherent risk framework) are as follows: 

• If the desks are measuring their outstanding risks and are keeping them within the 
risk limits, then the competitive optimum is not the global one. 

• If the desks are keeping track of their risk contributions to the whole firm, then the 
competitive optimum coincides with the global one. 

Moreover, it turns out that the global optimum is achieved regardless of what the initial 
allocation of risk limits between the desks is. (By the initial allocation we mean the 
risk limits given to the desks by the central management before the desks start to trade 
their risk limits.) Our result applies not only to the case of one risk constraint on the 
firm's portfolio, but also to the case of several risk constraints (a typical example is the 
constraints on each factor risk). 

8. Structure of the paper. In essence, the paper consists of two parts. The first 
part (Sections I2HU) deals with factor risk. In Section |21 we recall basic facts and examples 
related to coherent risk measures. Section |3] deals with factor risk. In Section 01 we 
study the problem of portfolio optimization under limits imposed on factor risks. The 
second part (Sections |SHZj) deals with (factor) risk contribution. In Section we recall 
basic facts related to risk contribution. Section El deals with factor risk contribution. In 
Section[71 we study the problem of risk sharing between the desks of a firm. The Appendix 
contains the L° -version of the Kusuoka theorem, which is needed for some statements of 
the paper and is of interest by itself. The recipes for the practical risk measurement are 
gathered in Section |H1 where we also compare various empirical risk estimation techniques 
considered in this paper with the classical ones like parametric V@R, Monte Carlo V@R, 
and historic V@R. The reader interested in practical applications only might proceed 
directly to Section |H1 

Acknowledgement. A. Cherny expresses his thanks to D. Heath and S. Hilden for 
the fruitful discussions related to the risk sharing problem. 

2 Coherent Risk 

1. Basic definitions and facts. Let (Q, JF, P) be a probability space. It is convenient to 
consider instead of coherent risk measures their opposites called coherent utility functions. 
This enables one to get rid of numerous minus signs. Recall that L°° is the space of 
bounded random variables on (fl, T, P) . 
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The following definition was introduced in [S], [T§] . 

Definition 2.1. A coherent utility function on L°° is a map u: L°° — > R satisfying 
the properties: 

(a) (Superadditivity) u(X + Y)> u(X) + u{Y) ; 

(b) (Monotonicity) If X < Y, then u(X) < u(Y); 

(c) (Positive homogeneity) u(XX) = Xu(X) for A G M + ; 

(d) (Translation invariance) u(X + m) = u(X) + m for m G M; 

(e) (Fatou property) If \X n \ < 1, X n — > X, then u(Jf) > limsup n u(X n ) . 
The corresponding coherent risk measure is ppT) = —u(X) . 

Remarks, (i) From the financial point of view, X means the P&L produced by some 
portfolio over the unit time period (taken as the basis for risk measurement) and dis- 
counted to the initial time. Actually, all the financial quantities in this paper are the 
discounted ones. However, the unit time period is typically small (for example, one day), 
and for such time horizons the discounted values are very close to the actual ones. For 
this reason, below we skip the word "discounted". 

(ii) The superadditivity property of u {— the subadditivity of p) has the following 
financial meaning: if we have a portfolio consisting of several subportfolios and the risk 
of each subportfolio is small, then the risk of the whole portfolio is small. V@R satisfies 
all the conditions of the above definition except for the subadditivity, and this leads to 
serious drawbacks of V@R. This can be illustrated by a simple example proposed in jl] . 
Suppose that a portfolio consists of 25 subportfolios (corresponding to several agents), i.e. 
its P&L X equals Y?n=i xn - Suppose that X n = I (A „ )C - 100/^, where P(A<) = 1/25 
and A 1 ,..., A 25 are disjoint sets ((A n ) c denotes the complement of A n ). This means 
that each agent employs a spiking strategy. If risk is measured by V@R0.05 ; then the risk 
of each subportfolio is negative (meaning that each subportfolio is extremely good from 
the viewpoint of this risk measure), while the P&L produced by the whole portfolio is 
identically equal to —76! 

A natural example, in which V@R is subadditive, is the Gaussian case: if X, Y have 
a jointly Gaussian distribution, then V@R A (X + Y) < V@R A (X) + V@R A (F). But 
actually on the set of (centered) Gaussian variables all the reasonable risk measures like 
V@R, variance, any law invariant coherent risk measure (see Example 12.7(1 coincide up 
to multiplication by a positive constant. However, for general random variables this is 
not the case, V@R and variance exhibit serious drawbacks (see |Jj), and coherent risk 
measures are really needed. □ 

The theorem below is the basic representation theorem. It was established in |Hj for 
the case of a finite Q (in this case the axiom (e) is not needed) and in ^1 for the general 
case (the proof can also be found in [23 Cor. 4.35] or Cor. 1.17]). We denote by V 
the set of probability measures on JF that are absolutely continuous with respect to P. 
Throughout the paper, we identify measures from V (these are typically denoted by Q) 
with their densities with respect to P (these are typically denoted by Z). 

Theorem 2.2. A function u satisfies conditions (a)-(e) if and only if there exists a 
non- empty set V CV such that 

u{X) = inf E Q X, X G L°°. (2.1) 
Qex> 
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Remarks, (i) Let us emphasize that a coherent risk measure is defined on random 
variables and not on their distributions. Using representation (j2.1j) . it is easy to construct 
an example of a risk measure p and two random variables X and Y with Law X = Law Y , 
but with p(X) ^ p{Y) . A particularly important subclass of coherent risk measures is the 
class of law invariant ones, i.e. the risk measures p that depend only on the distribution 
of X . However, it would not be a nice idea to include law invariance as the sixth axiom in 
Definition 12.11 Indeed, the basic risk measure used by an agent is typically law invariant. 
But there are many "derivative" risk measures like pf ( • ; Y) (many examples of naturally 
arising "derivative" risk measures can be found in ^3], ^H], |2H]), arid these ones need 
not be law invariant even if the basic risk measure p is law invariant. 

(ii) Coherent risk measures are primarily aimed at measuring risk. But they can be 
used to measure the risk-adjusted performance as well. The risk-adjusted performance 
based on coherent risk is a functional of the form p(X) = EX — Xp(X) , where p is a 
coherent risk measure and A G R + . (By E we will always denote the expectation with 
respect to the original measure P.) Note that the functional X i— ► EX is a coherent 
utility (it is sufficient to take T> = {P} in (j2.1j0 . Furthermore, if 

u n {X) = inf E Q X, n= 1,2 
Qex>„ 

are two coherent utilities and A G [0, 1], then 

\ Ul {X) + (1 - X)u 2 (X) = inf E Q X 

QeAX>i+(l-A)D 2 

is also a coherent utility (we use the notation A£>i+(1— X)T>2 = {AQi+(l— A)Q2 : Q n £ ^n}) 
Thus, (1 + \)~ 1 p(X) is again a coherent utility, which is a very convenient "stability" 
feature. As a result, coherent utility/risk can be used 

1. to measure risk; 

2. to measure the risk-adjusted performance, i.e. utility. 

(iii) Coherent utility may serve as a substitute for the classical expected utility. The 
techniques like utility-indifference pricing, utility-based optimization, utility-based equi- 
librium, etc. can be transferred from the expected utility framework to the coherent 
utility framework. Note that the intersection of these two classes of utility is trivial: it 
consists only of the functional X i— > EX (note that all the other expected utilities do not 
satisfy the translation invariance property). 

(iv) A generalization of coherent utility is the notion of concave utility introduced by 
Follmer and Schied [27| , [2H] ■ A functional u : L°° — > M is a concave utility function if it 
satisfies the axioms (b), (d), (e) of Definition 12. II as well as the condition 

(a') (Concavity) u(XX + (1 - X)Y) > Xu(X) + (1 - X)u(Y) for A G [0, 1]. 

The corresponding convex risk measure is p(X) = —u(X). The representation theorem 
states that u is a concave utility function if and only there exists a non-empty set T> C V 
and a function a : T> — > R such that 

u(X) = inf (E Q X + a(Q)), X G L°° 

Q£T> 

(the proof can be found in [2Z|, [201 Th. 4.31], or jHSl Th. 1.13]). A natural example of a 
concave utility function is 

u(X) = supjm el: inf E Q U{X - m) > x \, X G 
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where U : R — > R is a concave increasing function, T> C. V , and Xo € R is a fixed 
threshold. This object is closely connected with the robust version of the Savage theory 
developed by Gilboa and Schmeidler [HI]. For a detailed study of these concave utilities, 
see |2S1, j2Hl Sect. 4.9], or Sect. 1.6]. 

The theory of convex risk measures is now a rather large field. However, in applications 
to problems of finance like pricing and optimization, coherent risk measures turn out to 
be much more convenient than the convex ones. For this reason, we will consider only 
coherent risk measures. □ 

So far, a coherent risk measure has been defined on bounded random variables. Let us 
ask ourselves the following question: Are "financial" random variables like the increment 
of a price of some asset indeed bounded? The right way to address this question is to 
split it into two parts: 

• Are "financial" random variables bounded in practice? 

• Are "financial" random variables bounded in theory? 

The answer to the first question is positive (clearly, everything is bounded by the number 
of the atoms in the universe). The answer to the second question is negative because 
most distributions used in theory (like the lognormal one) are unbounded. So, as we 
are dealing with theory, we need to extend coherent risk measures to the space L° of all 
random variables. 

It is hopeless to axiomatize the notion of a risk measure on L° and then to obtain 
the corresponding representation theorem. Instead, following ^3], we take representa- 
tion (J2.1)) as the basis and extend it to L° . 

Definition 2.3. A coherent utility function on L° is a map u: L° — > [— oo, oo] defined 

as 

u{X) = inf E Q X, X E L°, (2.2) 
Qex> 

where V is a non-empty subset of V and EqX is understood as EqX + — EqX~ 
(X + = max{X, 0}, X~ = max{— X, 0}) with the convention oo — oo = — oo. (Through- 
out the paper, all the expectations are understood in this way.) 

Remark. This way of defining coherent utility has parallels to what is done with the 
classical expected utility. Namely, the Von Neumann-Morgenstern representation shows 
that (appropriately axiomatized) investor's preferences are described by EU(X) with a 
bounded function U . Then one typically takes a concave increasing unbounded function 
U : R — ► R and defines the preferences by EU (X) . □ 

Clearly, different sets T> might define the same coherent utility (for example, T> and 
its convex hull define the same function u). However, among all the sets T> defining the 
same u there exists the largest one. It is given by {Q G V : EqX > u(X) for any X}. 

Definition 2.4. We will call the largest set, for which (J2.1|) (resp., (J2.2j) ) is true, the 
determining set of u. 

Remarks, (i) Let p' be another coherent utility with the determining set T>' . Clearly, 

V'DV p > p. 

In other words, the size of T> controls the risk aversion of p. 
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(ii) The determining set is convex. For coherent utilities on L°° , it is also L 1 -closed 
(for the corresponding example, see Subsect. 2.1]). In particular, the determining 
set of a coherent utility on L° and the determining set of its restriction to L°° might be 
different. 

(hi) Let T> be an L 1 -closed convex subset of V. (Let us note that a particularly 
important case is where T> is L 1 -closed, convex, and uniformly integrable; this condition 
will be needed in a number of places below.) Define a coherent utility u by ()2.2|) . Then T> 
is the determining set of u . Indeed, assume that the determining set T> is larger than T> , 
i.e. there exists Qo G T> \ T> . Then, by the Hahn-Banach theorem, we can find Xq G L°° 
such that Eq Xo < infQ eI , EqX, which is a contradiction. The same argument shows that 
T> is also the determining set of the restriction of u to L°° . □ 

In what follows, we will always consider coherent utility functions on L° . 

2. Examples. Let us now provide several natural examples of coherent risk measures. 

Example 2.5 (Tail V@R). Let A G (0,1] and consider 

dQ 



V 



A 



{Q £ „ = £ < A-}. (2.3) 



The corresponding coherent risk measure is called Tail V@R (the terms Average V@R, 
Conditional V@R, Expected Shortfall, and Expected Tail Loss are also used). Let us denote 
it by p\ and the corresponding coherent utility by u\. 
Clearly, 

A' < A py > Pa, 

so that A serves as the risk aversion parameter. We have 

Px(X) — > — essinf X{u) 

AJ.0 u) 

(recall that essinf ^ X(u) := sup{a; : X > x a.s.}). The right-hand side of this relation is 
the most severe risk measure (it is easy to see that any coherent risk measure p satisfies 
the inequality p(X) < — essinf w X(uj) ). Furthermore, pi{X) = —EX, which is the most 
liberal risk measure (it is seen from Theorem A.l that any law invariant risk measure p 
satisfies the inequality p(X) > —EX). 

Let us provide a more explicit representation of u\ . Set 

Z, = A-/(X < 9A (X)) + 1 "p ( 7fj,y)f )> J (- Y = 

Throughout the paper, q x will denote the right A-quantile, i.e. 
q\ (X) = inf{x : F(x) > A}, where F is the distribution function of X. Then 
Z* > 0, EZ* = 1 and, for any Z G T>\, we have 

EXZ - EXZ* = E(X - q x {X))Z - E(X - q x (X))Z* 

= E[(X - qx(X))(Z - A- X )/(X - q x (X) < 0) 
+ (X - q x (X))ZI(X - q x (X) > 0)] > 0. 



Hence, 



u x (X) = EXZ, = A" 1 / xQ(dx) + (1 - A- X P(X < g A (X))g A (X), (2.4) 

J(-oo,q x (X)) 
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where Q = LawX. In particular, it follows that u\ is law invariant, i.e. it depends 
only on the distribution of X . If X has a continuous distribution, then this formula is 
simplified to: 

u A (X) = E(X|X <q x (X)). 
Using (|2.4jl . one easily gets an equivalent representation of u\\ 

u x {X) = \~ 1 f\ x {X)dx. (2.5) 
Jo 

It is seen from this representation that P\(X) > V@Ra(X). The advantage of Tail 
V@R over V@R is that it takes into account the heaviness of the A-tail (see Figure 1). 
Kusuoka [H] proved that on L°° , p\ is the smallest law invariant coherent risk measure 
that dominates V@Ra (the proof can also be found in Th. 4.61] or Th. 1.48]). 
This suggests an opinion that Tail V@R might be the most important subclass of coherent 
risk measures. However, there exists a risk measure, which is in our opinion much better 
than Tail V@R. This is the risk measure of the next example. □ 




Figure 1. These two distributions have the same A-quantiles (A is 
fixed), so that V@Ra is the same for them. However, the distribution 
at the right is clearly "better" than the distribution at the left. 



Example 2.6 (Weighted V@R). Let p be a probability measure on (0,1]. 
Weighted V@R with the weighting measure p (the term spectral risk measure is also 
used) is the coherent risk measure corresponding to the coherent utility function 

u lt (X)= [ u x (X)p(d\), 

7(0,1] 

where J f(x)p(dx) is understood as J f + (x)p(dx) — f f~ (x)p(dx) with the convention 
oo — oo = — oo . (Throughout the paper, all the integrals are understood in this way.) One 
can check that is indeed a coherent utility (see pHl Sect. 3] for details). The measure 
p reflects the risk aversion of p M : the more is the mass of p attributed to the left part of 
(0, 1], the more risk averse is p M . 

Let us give two arguments in favor of Weighted V@R over Tail V@R: 

• (Financial argument) Tail V@R of order A takes into consideration only the A-tail of 
the distribution of X; thus, two distributions with the same A-tail will be assessed 
by this measure in the same way, although one of them might be clearly better than 
the other (see Figure 2). On the other hand, if the right endpoint of the support 
of /i is 1 , then p M depends on the whole distribution of X . The paper [21] provides 
some further financial arguments in favor of Weighted V@R. 
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• (Mathematical argument) If the support of \i is the whole [0, 1] , then p M possesses 
some nice properties that are not shared by p\ . In particular, if X and Y are not 
comonotone (for the definition, see Section EJ), then p^X + Y) < p^X) + p^{Y) 
(the proof can be found in [TBJ Sect. 5], where this was called the strict diversi- 
fication property). This property is important because it leads to the uniqueness 
of a solution of several optimization problems based on Weighted V@R (see fBJ 
Sect. 5]). 

To put it briefly, Weighted V@R is "smoother" than Tail V@R. 




Figure 2. These two distributions have the same A-tails (A is fixed), 
so that TV@Ra is the same for them. However, the distribution at 
the right is clearly "better" than the distribution at the left. 



Weighted V@R is law invariant because Tail V@R possesses this property. 
Kusuoka [JT] proved that on L°° the class of Weighted V@Rs is exactly the class of 
law invariant coherent risk measures satisfying the additional property of comonotonicity, 
which means that p(X + Y) = p{X) + p(Y) for any comonotone (the definition is recalled 
in Section EJ) random variables X and Y (the proof can also be found in [23 Th. 4.87] 
or jS3 Th. 1.58]). 

Let us now provide several equivalent representations of Weighted V@R. It follows 
from (|2~5|) that 

u^X) = [ X- 1 [ q x {X)dxp{d\) = [ q x {X)^(x)dx, (2.6) 

where 

M x ) = f A-V(rfA), x e [0,1]. (2.7) 

J[x,l] 

The last formula establishes a one-to-one correspondence between the left-continuous 
decreasing functions if): [0, 1] — > [0, 1] with J" ^ ip(x)dx = 1 and the probability measures 
on (0,1]. 

In particular, let Q = {1,...,T} and X(t) = x t . Let xm , . . . , xrr) be the values 
Xi, . . . , xt in the increasing order. Define n{t) through the equality xi t ) = x n (t) . Then 

T rzt 

u^X) = y]xn(t) / 4>^(x)dx } (2.8) 
t=i 

where z t = Yli=i P{ n (0}- This formula yields a simple procedure of the empirical esti- 
mation of Ufi{X) . 
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In order to provide another representation of , consider the function 

Mv) d V= / A-V(rfA)dy, re e (0,1]. (2.9) 

Jo Jo J(y,l] 

It is easy to see that [0,1] — > [0,1] is increasing, concave, continuous, ^(0) = 0, 
and = 1. In fact, ()2.9|) establishes a one-to-one correspondence between the func- 

tions with these properties and the probability measures /x on (0, 1] (for details, see [23 
Lem. 4.63] or |55| Lem. 1.50]). The inverse map $ i— > \l is given by /i = — A\l/", where 
ty" is the second derivative of \P taken in the sense of distributions, i.e. it is the measure 
on (0, 1] defined by ^>"((a,b]) := *&' + (b) — ^>' + (a) , where ty, is the right-hand derivative. 
Let F be the distribution function of X. As the function x t— > q x (X) is constant on the 
intervals of the form [F(y—), F(y)) , we can derive from (|2.5jl the following representation: 



u^X) = [ q x {X)d^^x) = [ q F{x) (X)cM^F(x)) = [ xd*„(F(x)) = EY, (2.10) 
Jo Jr Jr. 

where Y is a random variable with the distribution function f^of. This representation 
provides a convenient tool for designing particular risk measures. Let us remark that 
the functionals of the form (|2.K)|) were considered by actuaries under the name distorted 
measures already in the early 90s, i.e. before the papers of Artzner, Delbaen, Eber, and 
Heath; see, for example, [22], [H2j (see also the paper which appeared at the same 
time as 

If X has a continuous distribution, we get from (|2.10j) one more representation: 

u^X) = [ x^(F(x))dF(x) = EX^(X) = E QmW X, (2.11) 
Jr 

where Q M (X) = ?/v(X)P. Note that 

E^{F(X)) = [ ^(F(x))dF(x) = [ ^(x)dx = [ [ \- l dxfi(d\) = 1, 
Jr Jo J(o,i] Jo 

so that Q At (X) is a probability measure. As ip^oF is decreasing, this measure attributes 
more mass to the outcomes corresponding to low values of X. Thus, Q At (X) reflects the 
risk aversion of an agent who possesses the position that yields the P&L X. 
The determining set X> M of admits the following representation: 

= {Z G L° : Z > 0, EZ = 1, and E(Z - x) + < $ M (x) Wx G (2.12) 

where 

$ M (x) = sup (tt^fo) - xy), x G R + (2.13) 
ye[o,i] 

(sec [16; Th. 4.6]). Note that $ M : R + — > M+ is continuous, decreasing, convex. $,,(0) = 1. 
$ M (x) > for x < J, Q ^ A _1 c?/i, and $^(x) = for x > J. Q ^ A _1 d//. 
One more representation of X> M is: 

P M = {Q G 7? : Q(A) < «V(P(A)) for any Aef} 

= |z G L° : Z > 0, E P Z = 1, and J q s {Z)ds < * M (ar) Vx G [0, 1]}. 
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J(o,i] A ^ 




It was obtained in (the proof can also be found in fl§\ Th. 4.73] or [531 Th. 1.53]). It 
is seen from this representation that 

V > ^ P»> > Pp. (2.14) 

Moreover, it is easy to see that 

// ^ y. p /t / > p M , (2.15) 

where the notation // ^ fi means that p stochastically dominates p' , i.e. their distri- 
bution functions satisfy > F M . In order to prove ()2.15|) . it is sufficient to notice that 
u\ is increasing in A and u^X) = Eu^(X) , where £ is a random variable on some space 
(n,^ 7 , P) with Law£ = /i. One should also use the following well-known fact (see [HZ| 
§ 1. A] )://=<! /i if and only on some space there exist £' < £ with Law£ = //, Law£' = /i'. 
For more information on Weighted V@R, see PQ, [2], JSj, [21] • 1=1 

Weighted V@R is a particular case of the more general class of risk measures that is 
described in the next example. 

Example 2.7 (Law invariant risk measures). A risk measure p is law invariant 
if p(X) = p(Y) whenever LawX = LawF. Kusuoka (Hj proved that a coherent risk 
measure p on L°° is law invariant if and only if it has the form 

p(X) = supp M (X) 

with some set 9JI of probability measures on (0, 1] (the proof can also be found in [2TJ| 
Cor. 4.58] or [S2 Cor. 1.45]). Theorem A.l extends this result to risk measures on L° . □ 

An interesting two-parameter family of coherent risk measures is provided by the 
example below. 

Example 2.8 (Moment-based risk measures). Let p 6 [l,oo] and a G [0,1]. 
Consider the set 

V = {1 + a{Z - EZ) : Z > 0, \\Z\\ q < 1}, 

where q = pj (p— 1) and \\Z\\ q = (EZ q ) l l q . Then the corresponding coherent risk measure 
has the form 

p(X) = -EX + a\\(X-EX)-\\ p 
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(see (213 Sect. 4]). In particular, if p = 2, a = 1, and EX = 0, then p(X) is the 
semivariance of X. Semivariance was proposed by Markowitz [IHj as a substitute for 
variance. Its advantage is that it measures really risk (i.e. the downfall); its disadvantage 
is that it is less convenient analytically than variance. □ 

The moment-based risk measures are law invariant, but they do not belong to the class 
of Weighted V@Rs, which is a very convenient class. Weighted V@Rs are parametrized 
by the probability measures p on (0, 1] , which is a huge class. For the practical purposes, 
one needs to select a convenient finite-parameter subclass of Weighted V@Rs. The most 
natural parametric family of probability measures on [0, 1] is the family of Beta distri- 
butions. It turns out that the family of corresponding Weighted V@Rs admits a very 
natural interpretation and a very simple estimation procedure. We call these measures 
Beta V@Rs in accordance with Beta distributions. 

Example 2.9 (Beta V@R). Let a G (-l,oo), (3 G (-l,a). Beta V@R with pa- 
rameters a, (3 is the Weighted V@R with the weighting measure 

p a:f3 (dx) = B(/3 + 1, a - V(l - x^-^dx, x G [0, 1]. 

This risk measure will be denoted as p a> p and the corresponding coherent utility will be 
denoted as u a> p. 

It follows from (f2~T5|) that 

a' > a -t=> p a ',p ^ p a>/3 /v jj3 > Pa,p, (2.16) 

> p n aj3 , ^ jj^p p aJ3 , < Pa j3 . (2.17) 

Furthermore, 

Pa,p ► Sq pa,p(X) ► -essinf X(u), 

a—>co a— >oc ui 

Pa,/3 — ■* Si => PaA X ) - EX > 
PW P^oi 

where S a denotes the delta-mass concentrated at a. In particular, these relations show 
that we can redefine p a>a (X) as —EX. 

Suppose now that a, (3 G N. Let X 1; ...,X a be independent copies of X, 
X(i), . . . , X( a ) be the corresponding order statistics, and £ be an independent uniformly 
distributed on {1, . . . , (3} random variable. Let us prove that 

1 A 



Pehfi {X) = -EX (0 = -EU^X (i) j. (2.18) 

" i=i 

The second equality is obvious, so we should prove only the first one. The random 
variables Xj can be realized as Xj = F _1 ([/j), where F is the distribution function of X 
and Ui,...,U a are independent uniformly distributed on [0,1] random variables. Let 
U(X) , . . . , U(a) denote their order statistics. We have (see (23 Ch. 1, § 7]) 

4- P ( U d) < x )= aC l -\x l -\l - x) a -\ x G (0, 1), 
ax 



so that 



P , P 



4z P(U {0 <x) = \Y.Tx P(f/ « - X) = -Q E^*"^ 1 " ^ x G (°' ^ 

" i=l " 1=1 
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and 

For the function \l/ Qi/ g := \I/ Mc| , we have 

KA X ) = + 1, a - /?)"V(1 - x)^- 1 ^ = P(C/ (0 < x), x G (0, 1). 

Moreover, the functions ^ a ,/3 and P(Urg\ < ■) coincide at and at 1. Consequently, these 
functions are equal, and therefore, 

P(A (?) <x) = P(F-\U {0 ) <x) = P(U {0 < F(x)) = VafiFix)), xeR. 

Recalling (EHUl) . we obtain (I2~T51) . 

It is seen from the calculations given above that for Beta V@R the function 
:= ipn^p has the form 

d 13 

^(x) = <^(x) = ^P(%< a ;) = ^(^x^ 1 (l-x)^, s 6(0,1). (2.19) 



i=l 



Let us remark that, according to (|2.5jl . Tail V@R admits the following representation: 
Pa (A) — Eg^(A) , where £ is a random variable on some space (Q, P) with the uniform 
distribution on [0, A] . Formula (|2.18|) can be rewritten as follows: p a ^(X) = — Egg(A), 
where A is a random variable distributed according to the empirical distribution con- 
structed by X\, . . . , X a , £ is an independent random variable with the uniform distribu- 
tion on [0,/3/ a], and E means the averaging over empirical distributions. □ 

An important one-parameter family of coherent risk measures is obtained from Beta 
V@R by fixing the value (3=1. 

Example 2.10 (Alpha V@R). Let a e (l,oo). Alpha V@R of order a is Beta 
V@R of order (a, 1). It is seen from ()2.16j) that a measures the risk aversion of p a . 
Furthermore, it follows from (|2.18jl that, for aGN, 

p a (X) = -E min X u (2.20) 

i=l,...,a 

where X\, . . . , X a are independent copies of A. □ 

The classes of risk measures described by Examples l2.5H2.10l are related by the follow- 
ing diagram: 

Tail Weighted Law-invariant Coherent 

V@R ^ V@R ^ risk measures ^ risk measures 

U U 

Beta Moment-based 
V@R risk measures 

U 

Alpha 
V@R 
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In our opinion, the best classes are: Weighted V@R, Beta V@R, and Alpha V@R. All the 
empirical estimation procedures considered in the paper will be provided for these three 
classes. 



3. Further examples. Coherent risk measures are primarily intended to assess the 
risk of non-Gaussian P&Ls. However, as an example, it is interesting to look at their 
values in the Gaussian case. 

Example 2.11. (i) Let u be a law invariant coherent utility that is finite on Gaussian 
random variables. It is easy to see that then there exists 7 G R+ such that, for any 
Gaussian random variable X with mean m and variance a 2 , we have u(X) = m — ^a . In 
particular, if X is a d-dimensional Gaussian random vector with mean and covariance 
matrix C, then p({h,X}) = ^(h,Ch), h G M. d . 

(ii) For Tail V@R, we get an explicit form of the constant 7: 



7(A) = 



1 



AV27T 



xe x2/2 dx = 



AV27T 



dy = 



1 



where q\ is the A-quantile of the standard normal distribution (in order to check the 
second equality, one should consider separately the cases A < 1/2 and A > 1/2). 




0.2 0.4 0.6 0.8 
Figure 4. The form of 7(A) 
(iii) For Beta V@R, we have from (ii): 



1 A 



7(a,/3) 



V27rB((3 + l,a- (3) 



1.75 
1.5 

1.25 
1 

.75 
0.5 
0.25 




7(a, 1) 

7 (a,2) 
7(a,3) 
7 (a,4) 
7(a,5) 



20 a 



Figure 5. The form of 7(0, (3) 
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Let us also give a nice credit risk example, which illustrates the effect of coherent risk 
diversification achieved in large portfolios. The example is borrowed from [20] . 

Example 2.12. Let \i be a measure on (0,1] such that f, Q ^ A -1 /i(c£\) < oo (for 
example, the weighting measures of Tail V@R, Beta V@R with (3 > 0, and Alpha V@R 
satisfy this condition). Let X 1; X 2 , . . . be independent identically distributed integrable 

random variables. Set S n = X\ + ■ ■ ■ + X n . By the law of large numbers, S n /n — > EX. 
Consequently, u\{S n /n) — > EX for any A G (0, 1]. Using the estimate < ^ _1 E|£| 

and the Lebesgue dominated convergence theorem, we get 




u x (EX)n(d\) = EX. 



This result admits the following interpretation. If a firm gives many loans of size L to 
independent identical customers (so that the n-th customer returns back the random 
amount X n ), then the firm is on the safe side provided that EX > L. □ 

4. Empirical estimation. Here we will describe procedures for the empirical estima- 
tion of Alpha V@R, Beta V@R, and Weighted V@R, which serve as coherent counterparts 
of the historic V@R estimation (see jJHl Sect. 6]). Let X be the increment of the value 
of some portfolio over the unit time period A. 

In order to estimate Alpha V@R with a G N, one should first choose the number of 
trials K G N and generate independent draws (xuu k = 1, . . . , K, I = 1, . . . , a) of X . 
This can be done by one of the following techniques: 

• Each Xki is drawn uniformly from the recent T realizations X\,...,xt of X . For 
example, if A is one day (which is a typical choice), these are T recent daily 
increments of the value of the portfolio under consideration. 

• Each x^ is drawn from recent T realizations Xi,...,xt of X, according to a 
probability measure v on {1, . . . , T}. A natural example is: T = oo, so that x t 
is the increment of the portfolio's value over the interval [—tA, —(t — 1)A], where 
is the current time instant; v is the geometric distribution with a parameter A. 
This method enables one to put more mass on recent realizations of X. It is at 
the basis of the weighted historical simulation (see |T7J Sect. 5.3]). Typically, A is 
chosen between 0.95 and 0.99. 

• Each x^ might be generated using the bootstrap method, i.e. we split the time axis 
into small intervals of length n _1 A and create each x&i as a sum of the increments 
of the portfolio's value over n randomly chosen small intervals. The bootstrap 
method might be combined with the weighting described above, i.e. we take recent 
small intervals with a higher probability than old ones. 

Having generated x^i , one should calculate the array 

l k = argminxfcz, k = l,...,K. 

!=l,...,a 

According to (|2.20j) . an empirical estimate of p a (X) is provided by 

1 K 

fe=l 

In order to estimate Beta V@R with a, (3 G N, one should generate xu similarly. 
Let Zfci, . . . , Ikp be the numbers I G {1, . . . , a} such that the corresponding xm stand at 
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the first ft places (in the increasing order) among Xki, ■ ■ ■ ,x^ a . According to (|2.18|) . an 
empirical estimate of p a ^(X) is provided by 

^ K P 
" fc=l i=l 

In order to estimate Weighted V@R, one should fix a data set X\, . . . , xt and a mea- 
sure v on {1, . . . , T} . This can be done by one of the following techniques: 

• The values X\, . . . , X? are recent T realizations of X . 

• The values Xi, . . . ,x? are obtained through the bootstrap technique, i.e. each x t 
is a sum of the increments of the portfolio's value over randomly chosen small 
intervals; v is uniform. 

Let X(i), . . . , X(r) be the values x±, . . . ,xt in the increasing order. Define n(t) through 
the equality X(t) = x n {t)- According to (|2.8j) . an empirical estimate of p^(X) is provided 
by 

p e (X) = -y^ar w (t) / ip^(x)dx, 
t=i J Zt -~t 

where z t = YlJ=i z/ { n (0) an d V'/j ^ s gi ven by (j2.7j) . 

An advantage of Weighted V@R over Alpha V@R and Beta V@R is that it is a wider 
class. However, Beta V@R is already rather a flexible family. A big advantage of Alpha 
V@R and Beta V@R is that their empirical estimation procedure does not require the data 
ordering (the number of operations required to order the data set Xi,. . . ,xt is TlnT; 
this is a particularly unpleasant number for T = oo, which is one of typical possible 
choices in estimating Alpha V@R and Beta V@R). Note that, for estimating V@R from 
a data set of size T, one needs to order the time series, and the number of operations 
required grows quadratically in T . Thus, Alpha V@R and Beta V@R are not only much 
wiser than V@R; they are also estimated faster! 



3 Factor Risk 

1. L 1 -spaces. Let (Q, J 7 , P) be a probability space and u be a coherent utility with the 
determining set T> . 

For the theorems below, we need to define the L l -spaces associated with a coherent 
risk measure (they were introduced in |14j). The weak and strong L -spaces are 

Ll(V) = {X G L° : u(X) > -oo, u(-X) > -oo}, 
L\(V) = \x G L° : lim sup E Q \X\I(\X\ > n) = o). 

Clearly, L\{V) C L X W {T>) . In general, this inclusion might be strict. Indeed, 
let X Q be a positive unbounded random variable with P(A = 0) > and let 
V = {Q G V : E Q X = 1}. Then X G L l w (V) , but X (£ L\(V). However, as shown by 
the examples below, in most natural situations these two spaces coincide and have a very 
simple form. 

Example 3.1. (i) If V = {Q} is a singleton, then L l w (V) = L\{V) = L^Q), which 
motivates the notation. 
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(ii) For Weighted V@R, we have L^V^) = L l s (V^ (see jH Subsect. 2.2]), so we 
can denote this space simply by L 1 (D /J ). It is clear from 1)2.12)1 that P G X> M , so that 
L 1 C (throughout the paper, L 1 stands for L 1 (P)). In general, this inclusion 
can be strict. However, if f, Q - A _1 /i(^^) < 00 (f° r example, the weighting measures of 
Tail V@R, Beta V@R with (3 > 0, and Alpha V@R satisfy this condition), then it is 
seen from f)2.12j) that all the densities from are bounded by f, Q ^ X^^dX) , so that 

(iii) If p is the moment-based risk measure of Example 12.81 with a > 0, then clearly 
L^iV) C L p and L p C Lj(Z>), so that Lj, (D) = Lj(X>) = L p . □ 

2. Factor risk. Let K be a random variable (resp., random vector) meaning the 
increment over the unit time period of some market factor (resp., several market factors). 

Definition 3.2. The factor utility is 

u f {X;Y) = inf E Q X, 

QeE(D|Y) 

where E{V\Y) := {E(Z\Y) :ZeV}. 

The factor risk is p*{X] Y) := —u^(X; Y) . 

Remarks, (i) As T> is convex, E(V\Y) is also convex. 

(ii) If T> is convex and L l -closed, then E(T>\Y) is not necessarily L 1 -closed. As an 
example, consider Q = [0, l] 2 endowed with the Lebesgue measure and let 

00 00 
V = {^2 a n Z n ■ a n G R+, ^a n = l|, 

n=l n=l 

where 

{l/n if Xl < 1/2, 
2n — 1 if xi > 1/2 and x 2 < l/n, 

otherwise. 

Let Y{x 1 ,x 2 ) = x 1 . Then 

E{Z n \Y) 21{x x > 1/2) i E{V\Y). 

n— >oo 

(iii) If T> is convex, L 1 -closed, and uniformly integrable, then E(V\Y) is also 
convex, L 1 -closed, and uniformly integrable. Indeed, let Z n G T> be such that 

L 1 

E(Z n |F) — ► Z. By Komlos' principle of subsequences (see [HH]), we can select a se- 
quence Z n G conv(Z n , Z n+ i, . . .) such that Z n Z. As D is convex and uniformly 
integrable, the convergence holds in L l and Z G P. Then E(Z|y) = Z, so that E(P|Y") 
is L 1 -closed. Its uniform integrability is a well-known fact. □ 

Theorem 3.3. For X G L\{V) fl Lj(E(X>|y)) Hi 1 , we fcawe 

^(X;F)= M (E(X|F)). 

This theorem follows from 
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Lemma 3.4. For X G L\(V) H L X S (E{V\Y)) n L 1 and Z e V , we have 
EXE(Z\Y) = EE(X\Y)Z. 

Proof. We can write X = £ n + r] n , where £ n = < n) , r) n — XI(\X\ > n) , 

n£N. By the definition of L 1 -spaces, E\r] n \Z — > and E|?7 n |E(Z|F) — > 0. The equality 

E£„E(Z|Y) = EE(^ n |r)E(Z|F) = EE{t n \Y)Z 

and the estimates 

E|E(77 n |y)|Z< EE(\ Vn \\Y)Z = EE(\ Vn \\Y)E(Z\Y) = E\ Vn \E(Z \Y) 
yield the desired statement. □ 

Recall that a probability space (O, JF, P) is called atomless if for any A 6 J with 
P(A) > 0, there exist A 1 , A 2 <E J 7 such that Ai fl A 2 ^ and P(Aj) > 0. 

Corollary 3.5. Suppose that (il,^ 7 , P) is atomless and u is law invariant. Then, for 
X G L\(T>) , we have 

uf(X-Y) = u(E(X\Y)). 

Proof. By Corollary A.2, L](E(V\Y)) C Lj(P). Furthermore, it is clear from (|2~T2"j) 
that P G X> M for any /i, so that, by Theorem A.l, P G T> and L 1 C L l s (T>) . Now, the 
result follows from Theorem 13.31 □ 

Example 3.6. Let u be a law invariant coherent utility that is finite on Gaussian 
random variables. Let (X, Y) = (X, Y 1 , . . . , Y M ) have a jointly Gaussian distribution. 
Denote X = X — EX , Y = Y — EY . We can represent X as 

M 



X = EX + J2 ^Y" 1 + f , 

m=l 

where E£ = and £ is independent of Y\ Then 

u^X-Y) = EX- 1 (E(jrh m Y m yy /2 = EX- 7 |pr Lin(Fl) ^(X)!, 

m=l 

where Lin(F 1 , . . . , Y M ) denotes the linear space spanned by Y 1 , . . . , F M , pr denotes the 
projection, and 7 is provided by Example 12.111 (i). 

Let us give an expression for u^(X; Y) in the matrix form. Denote a m = cov(X, Y m ) 
and let C be the covariance matrix of Y. Let L be the image of M A/ under the map 
x 1— > Cx. Note that a G L. The inverse C^ 1 : L — > L is correctly defined. The vector ft, 
is found from the condition 

cov(X m - (ft, Y),Y m ) = a m - {Ch) m = 0, m = 1, . . . , M, 

which shows that ft = C _1 a. We have 

As a result, 

^(X; Y) — EX — j(C-\ a) 1/2 . 
In particular, if Y is one-dimensional, then 
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From the financial point of view, p^(X; Y) means the risk of X in view of the uncer- 
tainty contained in Y . So, we could expect that passing from Y to a compound random 
vector (Y, Y') would increase (r . The theorem below states that this is indeed true for 
law invariant risk measures. 

Theorem 3.7. Suppose that (Q, J 7 , P) is atomless and u is law invariant. Then, for 
any random variable X and any random vectors Y, Y' , we have 



w 



\X;Y,Y')<u f (X;Y). 



Proof. By Theorem A.l, V = U^ean^V with some set 9Jt of probability measures on 
(0, 1] . It is seen from ()2.12|) and the Jensen inequality that 

E{V^\Y) = {Z G L° : Z is Y-measurable, Z > 0, EZ = 1, 
and E(Z - x) + < Va; G R+}, 

and the similar representation is true for E(X> At | V, Y') . Now, it is clear from the Jensen 
inequality that E(V^\Y) C E(X? /t \Y, Y') , so that E(X>|Y) C E(V\Y,Y'), and the result 
follows. □ 

Remark. Coherent risk measures p with the property p(E(X | Q)) < p(X) for any X 
and any sub-a-field Q of T are called dilatation monotonous (this property was intro- 
duced by J. Leitner |1H!)- Thus, Theorems 13.31 and 13.71 show that on an atomless space 
any law invariant risk measure is dilatation monotonous. □ 

The example below shows that the condition of law invariance is important in Theo- 
rem 

Example 3.8. Let T> consist of a unique measure Q. Take Y = and let Y' be such 
that a(Y') = T. Then u^(X; Y) — EX, while u f (X; Y, Y') = u(X) = E Q X . Clearly, the 
inequality EqX < EX might be violated. □ 

3. Factor model. The question that immediately arises in connection with the factor 
risks is: How close is u^(X; Y) to u(X)l Below we provide a sufficient condition for the 
closeness between uf(X;Y) and u(X) . This will be done within the framework of the 
factor model, which is very popular in statistics. 

Let u be a law invariant coherent utility that is finite on Gaussian random variables 
and F = (F 1 , . . . , F M ) be a random vector whose components belong to L l w {T>) . We will 
assume that u((b, F)) < for any b G R M \ {0}. Let 

X n = B n F + f n , n G N, 

where B n is n x M-matrix and £ n = (^, . . . , , where . . . , £™, F are independent 
and ££j is Gaussian with mean and variance (c l n ) 2 . We will assume that there exists a 
sequence (o n ) such that a n — > oo and 



a-^B'r >b m , m = l,...,M, 



fc=i 

n 



fe=l 

with 6 = (6 1 ,...,6 Af ) t^O. 
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Theorem 3.9. We have 

u(Yl n k=l X n) ™ 

Proof. We have 

n 

a- 1 u(j2Xn)=<(t>n,F)+T ]n ), 
fe=l 

where 6™ = a" 1 Y^k=i > m = 1, • • • , A^, and ?? n is Gaussian with mean and variance 
o\ = a~ 2 Yl^i^n) 2 ■ Moreover, r\ n is independent of F. Let r\ be a Gaussian random 
variable with mean and variance 1 that is independent of F . In view of the law invariance 
of u , 

u((b n , F) + r) n ) = u((b n , F) + a n r]). 

Consider the set G = c1{Eq(F, 77) : Q G V} , where "cl" denotes the closure and V is the 
determining set of u. In view of the inclusions F m G L^iV) , 77 G L^(X>), the set G is a 
convex compact in M M+1 (in the terminology of [T3], G is the generator of (F, 77) and w). 
Then 

u((b n , F) + a n r]) = inf E Q ((b n , F) + cr n ?7) = inf {(b n , a n ),x) 
> inf ((6, 0), x) = inf E Q (6, F) = u((b, F)). 

In a similar way we prove that 

n 

a'W (V X k n , F) = u((b n , F)) > u((b, F)). 

k=l 

To complete the proof, it is sufficient to note that u((b, F)) < 0. □ 

4. Multifactor risk. The previous theorem shows that in order to assess 
the risk p(X) , one can take the main factors Y 1 , . . . ,Y M driving risk, and then 
p(X) pa p* (Jf; Y 1 , . . . , Y M ) . The question arises: What is the relationship between 
p f (X; Y\ . . . , Y M ) and p f (X; Y l ) + ■ ■ • + p f (X; Y M ) ? First, we provide a positive state- 
ment. 

Proposition 3.10. Assume that Y 1 , . . . ,Y M are independent and X = Ylm=i-^- m > 
where X m is Y m -measurable, X m G L\(V) fl L\(E(V\Y)) fl L 1 , and EX m = 0. Then 

M 

pf(X;Y\...,Y M )<J2p f (X;Y m ). (3.1) 

771=1 

Proof. We have 

/(X; Y\ . . . , Y M ) = p(E(X IF 1 , ... , Y M )) = p(X), 
pf{X-Y m ) = p{E{X\Y m )) = p{X m ) 1 

and the result follows from the subadditivity property of p. □ 

The conditions of the proposition are unrealistic because different factors are corre- 
lated. This might lead to violation of (j3.1|) as shown by the example below. 
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Example 3.11. Let (Y 1 , Y 2 ) be a Gaussian random vector with mean and covari- 
ance matrix 

1 l-e 
1-e 1 

Take X = Y 1 — Y 2 . Let p be a law invariant coherent risk measure that is finite on 
Gaussian random variables. Then 



p J {X- Y\ Y 2 ) = p(X) = 7 (var(F 1 - Y 2 )) 1 ' 2 = 7 Vte, 



where 7 is provided by Example 12.111 (i). On the other hand, by Example 13. ti\ 



If e is small enough, then the inequality jy2e < 2^ye is violated. □ 




Figure 6. In Example GDU p f (X; Y 1 ; Y 2 )h 
is the length of X, while pi (X;Y n ) /j is the 
length of the projection of X on Y n . 

The effect described in this example has the following financial background. Suppose 
that we have several correlated factors Y 1 , . . . , Y M and a portfolio consisting of M parts, 
i.e. X = ^2 m X m , where the risk of the m-th part is driven mainly by the m-th factor. 
Then X m is correlated with Y k for m 7^ k simply because Y m and Y k are correlated. 
Thus, when summing up p^{X; Y m ) over m, we are calculating the factor loading of Y m 
in X m once through the m-th factor risk and then several times more through the other 
correlated factor risks. This might lead to a significant increment as well as a significant 
reduction of the estimated risk (which was described by Example 13. 1 If) . In this situation, 

the right way to estimate p(X) is to take p f (X 1 ; Y 1 ) H h p f {X M ; Y M ) . Indeed, if each 

X m corresponds to a big portfolio, then above results tell us that p{X m ; Y m ) m p(X m ) , 
and then 

M M 



m=l m=l 

Another pleasant feature of this technique is that we estimate the m-th factor risk only for 
a part of the portfolio rather than the whole portfolio, which accelerates the computation 
speed. 

However, it might happen that we cannot split a portfolio in several groups such 
that the risk of each group is affected by one factor only (for example, the risk of credit 
derivatives is affected by the whole yield curve). But then we might combine the one- 
factor and the multifactor techniques as follows. Suppose that we can split the factors 
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into several groups (thus we again have Y 1 , . . . , Y M , but now these random variables are 
multidimensional) and split the portfolio into M parts, i.e. X = ^2 m X m , so that the 
risk of the m-th part is driven mainly by the m-th group of factors. Then we can assess 
the risk of the portfolio as p / (X 1 ; Y 1 ) + ■ ■ ■ + p f (X M ; Y M ) . 

5. Empirical estimation. In view of Theorem the empirical estimation of 
pf(X;Y) reduces to finding the function f(y) = E(X\Y = y) and then applying the 
procedures described at the end of Section [21 to Xki '■= f(yki)- However, for factor risks 
we can use one more convenient method of the choice of data. 

Suppose that Y is one-dimensional and let a be the current volatility of Y . It might 
be estimated through one of numerous well-known methods; in particular, a might be the 
implied volatility. It is a widely accepted idea that volatility serves as the speed of growth 
of the inner time (known also as the business or operational time) for Y . In other words, 
if the current volatility is a , then Y is currently oscillating at the speed a 2 . Following 
this idea, we can take as the data for Y the values yi, . . . , yj<, where y t is the increment 
of Y over the time interval [—to~ 2 A, — (t — 1)<t 2 A] and A is the unit time period. Here 
it is reasonable to take standardized time series for Y rather than the ordinary one: we 
calculate empirically the integrated volatility and take its inverse as the time change to 
obtain standardized time series from the ordinary one. This approach enables one 

• to use large data sets; 

• to capture volatility predictions immediately. 

If A is one day (which is a typical choice), then a 2 A would be a non-integer number of 
days, which is not very convenient. This can be overcome as follows. We choose a large 
number n G N and split the time axis into small intervals of length n A. Then we 
approximate a 2 by a rational number m/n and generate each y t as a sum of increments 
of Y over m randomly chosen small intervals. In other words, this is a combination of 
the time change procedure with the bootstrap technique. 

Instead of the time change procedure described above, one can also use the classi- 
cal scaling procedure, which is at the basis of the filtered historical simulation (see 
Sect. 5.6]). Namely, instead of altering the time step for y t , we keep the same time 
step A, but multiply each y t by the current volatility a. As above, it is reasonable to 
take standardized time series for Y rather than the ordinary one: we divide each ob- 
served increment of Y by its volatility estimated through one of standard techniques (for 
example, GARCH). 

Let us remark that both the time change and the scaling work only for one-dimensional 
Ys because if Y is multidimensional, its different components have different volatilities. 
This is a big advantage of one-factor risks. 

4 Portfolio Optimization 

1. Problem. Let (f2,jF, P) be a probability space and u 1 ,...,u M be coherent utili- 
ties with the determining sets D 1 , . . . ,D M . A particular example we have in mind is 
u m = v,f( • ; Y m ) , where Y 1 , . . . , Y M are the main factors driving the risk of a portfolio. 
Let X 1 , . . . , X d G f| m L l,iP m ) be the P&Ls produced by traded assets over the unit time 
period, so that the space of possible P&Ls attained by various investment strategies is 
{(h,X) : h G R d }, where X = (X\...,X d ). We will assume that u m ((h,X)) < for 
any h G IR d \ {0}. This condition means that the risk of any trade is strictly positive 
and is known as the No Good Deals condition (see [Ej). Let E = (E 1 , . . . ,E d ) be the 
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vector of rewards for X 1 , . . . ,X d . One might think of E as the vector of expectations 
(EX 1 , . . . , EX d ) . However, this is not the only interpretation of E. In general, we mean 
by E the vector of subjective assessments by some agent of the profitability of the assets 
X 1 , . . . , X d . In this case E need not be related to EX, and EX can be equal to zero. 

We will consider the Markowitz-type optimization problem, risk being measured not 
as variance, but rather as the vector of risks: 

{{h, E) — > max, 
h e R d , (4.1) 
p m ((h,X)) < c m , m = 1,...,M, 

where c 1 , . . . , c M G (0, oo) are fixed risk limits. 

2. Geometric solution. The paper [To"! Subsect. 2.2] contains a geometric solution 
of this problem with M — 1. Here we will present a similar geometric solution for an 
arbitrary M. Let us introduce the notation 

G m = cl{E Q X : Q G V m }, m = l,...,M, 
G = conv{G m /c m : m = 1, . . . , M}. 

Note that G m ,G are convex compacts in M. d . According to the terminology of 14 , G m 
is the generator of X and u m . The role of this set is seen from the equality 

u m ((h,X}) = inf (h,x), m = l,...,M. (4.2) 

x€G m 

The right-hand side is the classical object of convex analysis termed the support function 
of the set G m . The notion of the generator was found to be very convenient for the 
geometric solutions of various problems like capital allocation, portfolio optimization, 
pricing, and equilibrium (see [H], [IS]). As u m ((h,X)) < for any h E R d \ {0}, the 
point belongs to the interior of G. Let T be the intersection of the ray (E,0) with the 
border of G . Denote by N the set of inner normals to G at the point T (typically, N is 
a ray). 

Theorem 4.1. The set of solutions of problem (|4.1|) is {h G N : (h,T) = —1} and 
the maximal (h,E) is \E\/\T\. 

Remark. Note that X is a non-empty cone, so that {h G X : (h,T) = —1} ^ 0. 
Furthermore, < |i?|/|T| < oo. □ 

Proof of Theorem 14 1 II . Using ()4.2j) . we can write 

{{h, E) — > max, I (h, E) — > max, 

heR d , ^Met d , 
inf (h, x) > -c m \mf(h,x)>-l. 
ieG m ' KxeG 

Denote {h G X : (h,T) = —1} by H*. For any h G , we have 

inf (h,x) = (h,T) = -1 
xeG 
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Figure 7. Solution of the optimization 
problem. Here h* denotes the optimal h. 



and 

<^ = -jfjfo^ = jfj- ^ 

If h and mf xeG (h, x) > —1, then mf xeG (h,x) < (h,T), so that (h,T) > — 1, 

and, due to (|OJ), (/i, £) < " □ 

Remark. We can also provide a geometric solution of (J4.1|) under portfolio constraints 
of the type h G H , where if is a convex cone, and the ambiguity of the reward vector E . 
This is done by transforming (j4.1j) into the problem with one constraint as described 
above and then applying the result of 15; Subsect. 2.2], where the cone constraints and 
the ambiguity were taken into account. □ 

Example 4.2. Let X 1 , . . . ,X d G L l {T>^) and Y be an M-dimensional random vec- 
tor. Then the generator of ■ ; Y) and X has the form 

cl{E Q X : Q G E(^|F)} = cl{ / f(y)Z(y)Q{dy) :Z>0, [ Z(y)Q(dy) = 1, 

vi 1 ' Jr m 

and / (Z(y) - x) + Q(dy) < %(x) Vx G M+|, 

where f(y) = E(X\Y = y) , Q = LawV, and <E> M is given by ()2.13|) . In order to prove 
this equality, denote its left-hand side by G and its right-hand side by G' . Due to (j2.12|) . 



inf (h,x) = u' ((h,f)), he 

x&G' 



where u'^ is minus the Weighted V@R with the weighting measure /i on the probability 
space B(R M ), Q) . As and u'^ depend only on the distribution of a random 

variable (this is seen from ()2.6|) . ()2.10|) ) and LawQ(/i, /) = Law(/t, f(Y)), we get 

u>'»((h, /)) = u,((h, f(Y))) = u,(E((h, X)\Y)) = uf(X; Y) 

= inf E Q {h,X) = M{h,x), heR d . 
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Thus, the support functions of G and G coincide. Furthermore, both G and G are 
convex and closed. As a result, G = G . □ 



3. Practical aspects. The geometric solution presented above provides a nice 
theoretical insight into the form of the optimal portfolio. It can be used if we have a 
model for the joint distribution of X 1 , . . . , X d . However, if we do not have such a model, 
but rather want to approach (|4.1|) empirically, then, instead of the geometric solution, the 
following straightforward procedure can be employed. First of all, 



Due to the scaling property, the solution to this problem coincides up to multiplication 
by a positive (easily computable) constant with the solution of the problem 



This is a problem of maximizing a convex functional over an affine space. A typical 
example we have in mind is u m = u*( ■ ; Y m ) , where u is one of u^, u a ,p, or u a . In this 
case the values u m ((h,X)) can easily be estimated empirically through the procedures 
described at the end of Section [21 

If E is the vector of expected profits (EX 1 , . . . , EA d ), then its empirical estimation 
is known to be an extremely unpleasant problem (see the discussion in t 9] and the 20's 
example in [SI]), unlike the estimation of the volatility-type quantities u m . The reason 
is that this vector is very close to zero, and therefore, its direction (which is in fact the 
input that we need) depends on the data in a very unstable way. One of possible ways 
to overcome this problem is to use theoretical estimates of E rather than empirical ones. 
For example, Sharpe's SML relation (|58|) implies that 



(recall that X 1 is the discounted P&L produced by the z-th asset). Thus, the direction 
of E (and this is what we need) coincides with that of (/3 1 , . . . , f3 d ) . 

5 Risk Contribution 

1. Extreme measures. Let (Q, JF, P) be a probability space and u be a coherent utility 
with the determining set T>. Let W be a random variable meaning the P&L produced 
by some portfolio over the unit time period. 

The following definition was introduced in |14j . 

Definition 5.1. A measure Q G T> is an extreme measure for W if 
E Q W = u{W) E (-00,00). 

The set of extreme measures will be denoted by Xx>(W). 





E % = (f x const, i 



l,...,d 
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Proposition 5.2. If the determining set T> is ^-closed and uniformly integrable, 
while W G L](V), then X V (W) ± 0. 

Proof. It is clear that u(X) G (—00,00). Find a sequence Z n G T> such that 
EZ n X — > u(X) . By the Dunford-Pettis criterion, T> is compact with respect to the weak 
topology o^L 1 ,// 00 ). Therefore, the sequence (Z n ) has a weak limit point G V . 
Clearly, the map T> 3 Z 1— > EZX is weakly continuous. Hence, EZ^X = u(X) , which 
means that Z^ G Xx>(X) . □ 

Remark. It is seen from (I2.12j) that the determining set of Weighted V@R is L 1 -closed 
and uniformly integrable (note that $ M (x) — >■ as x — > 00). □ 

Example 5.3. (i) If A G (0, 1] and W G L x , then 

X Vx (W) = {Z:Z>0,EZ = 1,Z = \- 1 a.s. on {W < q x (W)}, 

and Z = a.s. on {W > q x (W)}}, ^ 

where V>\ is given by ()2.3j) . Indeed, if Z belongs to the right-hand side of then, for 

any Z' G T>\, we have 

EWZ' - EWZ = E(W - q x (W))Z' - E(W - q x (W))Z 

= E[(W - q x (W))(Z' - \- l )I{W - q x (W) < 0) 
+ (W- q x (W))Z'I(W - q x (W) > 0)]. 

It is seen that this quantity is positive and equals zero if and only if Z' belongs to the 
right-hand side of (|5.1|) . 

(ii) Let Q = {1,...,T} and W{t) = w t . Assume that wi < ■■■ < wt- Set 
z t = ^2l = iP{i} ■ It is seen from (i) that, for any A G (0,1], Xx> x {W) consists of the 
unique measure Q X {W) having the form 

{\- l P{i}, i<n-l, 
0, i > n, 

where n is such that z n _i < A < z n . This can be rewritten as 

Qx(W){i} = A" 1 / /(^_! < x < Zi)dx, 1 = 1, . . . , T. 
Jo 

It follows from [13 Prop. 6.2] that X VtL {W) consists of the unique measure 
QmW = I m Qx{W)fi{dX). We have 



Q»(W){i} = [ [ A- 1 /(^_i < x < Zi )dxfi(dX) 

J (0,1] Jo 

= / A _1 /(2;i_i < x < Zi)fi(d\)dx 

Jo J\x,l] 



1] 
1 

I{zi_i < x < z n )i)^(x)dx 
ip fl (x)dx, i = l,...,T, 
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where ip^ is given by ()2.7|) . 

(iii) If W G has a continuous distribution, then X-p^iW) consists of the unique 
measure Qn(W) = tpn(F(W))P , where F is the distribution function of W (for the proof, 
see [TBI Sect. 6]). The measure Q M has already appeared in (|2.11jl . 

(iv) Suppose that a, (3 G N and W G L l {V^) has a continuous distribution. For 
Beta V@R, the measure Q^iW) of the previous example gets a more concrete form 
QaAW) = ^ a AF(W))P = <p(W)P, where ^ is provided by fUSJ. 

Let us clarify the meaning of <p. Let Wi, . . . ,W a be independent copies of W, 
W(i) , . . . , W( a ) be the corresponding order statistics, and £ be an independent uniformly 
distributed on {1, . . . , (3} random variable. According to the reasoning of Example | 
the distribution function of W^) is ^ a ,p ° F , where 

v a A x ) = ip a Av) d y, xe[o,i]. 

Jo 

Consequently, 

GtLawVV dF(x) 

(v) For Alpha V@R with a G N, we have ip a {x) = a(l — and 



<p(x) = a(l - F(x)) 



n _, dLaw min{Wi, . . . ,W a } 



□ 



If an agent is using the classical expected utility EU(X) to assess the quality of 
his/her position, then there exists his/her "personal" measure with which he/she assesses 
the quality of any possible trade. This measure is given by Q = cU'(Wi)P, where W\ is 
the agent's wealth at the terminal date and c is the normalizing constant. The role of 
this measure is seen from the equality 

lim£- 1 (Ef/(iy 1 +eX) - EU{W X )) = £XU'{W X ) = c _1 EqX 

ej.0 

If X is the P&L produced by some trade and X is small as compared to W , then X 
is profitable for the agent if and only if EqA > 0. The extreme measure is the coherent 
substitute for this agent-specific measure as seen from Theorem 15.51 stated below. 

2. Risk contribution. The following definition was introduced in |14j . 

Definition 5.4. The utility contribution is defined as 

u c (X; W) = inf E Q X, X G L°. 

The risk contribution is defined as p c (X; W) = —u c (X; W) . 
If Xx>(W) is non-empty, then u c ( ■ ; W) is a coherent utility. 

Theorem 5.5. If T> is L 1 -closed and uniformly integrable and X, W G L\{T>) , then 
u c (X; W) = lime-^uiW + eX) - u{W)). 

e|0 
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For the proof, see [HJ Subsect. 2.5]. 

For theoretical purposes, it is sometimes convenient to use a geometric represen- 
tation of u c (X;W). Suppose that T> is L 1 -closed and uniformly integrable, while 
X,W G L\{V). Denote by G the generator cl{E Q (X, W) : Q G V} and set e x = (1,0), 
e 3 = (0,1). Then 

X V (W) = {QeV:E Q W = mm(e 2 ,z)}, 

and therefore, 

u c (X; W) = min{x : (x, min(e 2 , z)) GG}. (5.2) 




Example 5.6. (i) If W is a constant, then X V (W) = V, so that u c (X; W) = u{X). 

(ii) If X = XW with A G R+, then w c (X; = Am(I^) provided that ^ 0. 

(iii) Let u be law invariant and (X, W) be jointly Gaussian. We assume that 
X, W G -^K^) an d that the covariance matrix C of (X, VF) is non-degenerate. Set 
X = X - EX, W = W - EW . As (X,W) can be represented as C^V^heie V is a 
standard two-dimensional Gaussian random vector, the generator G of (X, W) has the 
form G = C 1 l 2 Gy , where Gy is the generator of V . Clearly, Gy is the ball of radius 7, 
where 7 is provided by Example 12.111 (i). Thus, 

G = {x G R 2 : (C- 1/2 x, C~ l/2 x) < 7 2 } = {x G M 2 : (x, C" 1 ^ < 7 2 }. 

Set ei = (1,0), e 2 = (0,1). By (j5.2j) . u c (X, W) = (ex,z*), where z* = argmim, eG (e 2 , z) . 
The point z* is found from the condition 



d 

de 



e=0 



(z* + eei,C 1 (^ + eei)) = 2(ei,C 



which shows that C^z* = ae 2 with some a < 0, i.e. z* = aCe 2 . The constant a is 
found from the condition (z*, C~ l z*) = 7 2 , which shows that a = — j{e 2 , Ce 2 )~ 1 ^ 2 ■ As a 
result, 

*™ * " " + *™ — (fel^ = — 

Note that u(X) = EX - 7(varX) 1 / 2 . In particular, if EX = EW = 0, then 

^(XT = C ° rr(X ' W) = V@R(X) ' (5 ' 3) 
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where V@R C denotes the V@R contribution (for the definition, see |46; Sect. 7]). 

(iv) Let ft — {1, . . . ,T}, X(t) = x t , W(t) = w t . Assume that all the values w t are 
different. Let wm, . . . , u>m be the values wi, . . . , wt in the increasing order. Define n(t) 
through the equality iom = w n r t y According to Example 15. HI (ii). 



^(X; w ) = E Q M (iy)X = x n(t) / ipn{x)dx 



t=\ 



where z t = £* =1 P M*)} (cf. dZHD). 

(v) If W G L l {V[j) has a continuous distribution, then, according to Example l5.3l (iii) 



<(X; W) 



X = EX^{F{W)) 



(cf. (|2.11j) ). Note that this value is linear in X . 

(vi) Suppose that a,/3 G N, X, W G L 1 , and W has a continuous distribution. 
Let (X±, Wi), . . . , (X a , W a ) be independent copies of (X, W) and £ be an independent 
uniformly distributed on {1, . . . ,/3} random variable. Let W(i), . . . , W( a ) be the corre- 
sponding order statistics. Define random variables n(i) through the equality Wm = W n ^ 
(as W has a continuous distribution, all the values W\, . . . , W a are a.s. different, so that 
n(i) is a.s. determined uniquely). We have (cf. (|2.18|l ) 



EX 



n(0 



nu 



1 members of Wi, 



W a are smaller than Wj 
W a are greater than Wj} 



^ i=l i=l 

^ i=i i=i 

and a — z members of Wi , . 

= ^ E E ^jF(Wjy-\i - F(^.)) Q -i 
p i=i j=i 

= -r E c a z\E[XF(wy-\i - 
^ i=i 

= < i/3 (A;iy). 

(vii) Suppose that a G N, X, W G L 1 , and has a continuous distribution. It 
follows from (vi) that 



- argmin Wi 

i — 1, . . . ,a 



where (X 1 ,Wi) 



u c a (X;W) = EX e 
(X a , W a ) are independent copies of (X, W) . 



□ 



3. Capital allocation. The notion of risk contribution is closely connected with 
the capital allocation problem. Suppose that a firm consists of several desks, i.e. 
W = T^ = iW n , where W n is the P&L produced by the n-th desk. The following 
definition was introduced in 
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Definition 5.7. A collection capital allocation between W 1 , . . . , W if 

AT N 

"£x n = P (j2 Wn )> ( 5 - 4 ) 

n=l n=l 

N N 

J2h n x n <p(^2h n X n ^ W,...,h N GR+. (5.5) 

n=l n=l 

From the financial point of view, x l means the contribution of the i-th component to 
the total risk of the firm, or, equivalently, the capital that should be allocated to this com- 
ponent. In order to illustrate the meaning of (|5.5jl . consider the example h n = I(n G J), 
where J is a subset of {1, . . . , N} . Then ()5.5|) means that the capital allocated to a part 
of the firm does not exceed the risk carried by that part. 

The following theorem was established in |TH Subsect. 2.4]. 

Theorem 5.8. Suppose that T> is L l -closed and uniformly integrable, while 
W 1 , . . . ,W N G L\(V) . Then the set of solutions of the capital allocation problem has 
the form {-E Q (W\ W N ) : Q G X V (W)} . 

If Xx>(W) consists of a unique measure Q, then the solution of the capital allocation 
problem is unique and has the form (p°(W 1 ; W), . . . ,p c (W N ; W)). In particular, in this 

case 

N N 

P (W) = -e q w = -J2 EqW" = p c ( r ; w). (5.6) 

n=l n=l 



4. Tail correlation. It follows from the inclusion Xx>{W) C T> that 
u c (X;W) > u(X). Typically, for a random variable X meaning the P&L of some trans- 
action, we have u(X) < (i.e. its risk is strictly positive), so that 

u{X) 

The coefficient x is a good measure for the tail correlation between X and W (see (I5.3JI ). 

Let us recall that the standard tail correlation coefficient between X and W (see jJ7| 
Sect. 5.2.3]) is defined as lim A | c A(AT; W) , where 

(x w) = PjX < q x (X), W < q x (W)) EI(X < q x (X))I(W < q x (W)) 
° X{ ' ' P(X<q x (X)) EI{X<q x {X))I(X<q x (X))- 

At the same time, x{X\ W) corresponding to u = u x has the form 

EXI{W < q x {W)) 



>c x {X-W) 



EXI(X < q x (X)) 



This is the same as the expression for c x (X; W) with I(X < q x (X)) being replaced by X . 
However, an essential difference between x and the standard tail correlation coefficient 
is that x is not symmetric in X, W . 

Let us also remark that, for the Weighted V@R, x(X; W) remains unchanged under 
monotonic transformations of W, i.e. x(X;W) = x(X,f(W)), where / is a strictly 
increasing function (this is seen from Example 15.61 (v)). 

Let us study some basic properties of x. Two propositions below correspond to two 
extremes: no correlation and complete correlation. 
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Proposition 5.9. Let X,W G L 1 and suppose that W has a continuous distribution. 
Then u c x (X; W) = for any A G (0, 1] if and only if E(X\W) =0. 

Proof. Set Z x = X~ l I(W < q x (W)). According to Example EH (v) , 

u c x (X; W) = \- l EXI(W < q x (W)) = X' 1 [ g(w)Q(dw), A G (0, 1], 

J(-oo,q x (W)] 

where g(w) = E(X\W = w) and Q = Law HA Now, the result is obvious. □ 

Recall that random variables X and W are called comonotone if 
{X(u 2 ) -X(ux))(W(w*) ~ W(ux)) > for P x P-a.e. u lt u 2 . 

Proposition 5.10. Suppose that the support of fi is [0,1]. Let X, W G L 1 (T' M ). 
Then u^X; W) = u^(X) if and only if X and W are comonotone. 

Proof. Let us prove the "only if" part. The map £> M 9Zh EWZ is continuous with 
respect to the weak topology a{L l ,L°°). Hence, the set X-p^iW) is weakly closed. An 
application of the Hahn-Banach theorem shows that X-p^iW) is L 1 -closed. By Propo- 
sition E2 there exists Z G X Vfi (W) such that EXZ = u c (X;W). According to PU 
Th. 4.4], there exists a jointly measurable function (Z(X,u>); X G (0, l],u> G ft) such that 
Z = J, Q1 , Z x /j,(d\) with Z x G V x for any A. Then 

/ EWZ\/j,(dX) — EWZ — u^(W) — [ u x (W)fx(dX), 
J (0,1] J(o,i] 

and it follows that Z\ G Xp x (W) for /z-a.e. A. Furthermore, 

/ EXZ X fi(dX) = EXZ = u c ^X; W) = u^X) = [ u x (X)fx(dX), 

J (0,1] ' J (0,1] 

and it follows that Z x G Xv x (X) for /i-a.e. A. Thus, X Vx {W) PI X Vx (X) ^ for /x-a.e. A. 
Using (|5.1|) . we get 

P{X > q x {X), W < q x (W)) = P(X <q x (X),W> q x {W)) = (5.7) 

for /i-a.e. A. As the functions q x (X) and q x (W) are right-continuous in A and the 
support of fi is [0, 1] , we deduce that (|5.7|l is satisfied for every A G (0, 1] . From this it 
is easy to deduce that P((X,W) G /((0, 1])) = 1, where /(A) = (q x (X) , q x (W)) . Thus, 
X and W are comonotone. 

Let us prove the "if" part. By [2U Lem. 4.83], there exists a random variable £ and 
increasing functions f,g such that X — /(£), H 7 = <?(£). Set 

Z A = A- 1 /(£<g A (0) + c/(£ = g A (0), 

where c is the constant such that EZ A = 1. According to JOJ Th. 4.4], 
Z := J (Q1] Z X fi(dX) eD,,. It is clear that Z x G ^(X) n Ak A (W). Hence, 

EWZ = [ EWZ X fi(dX) = [ u x (W)fi{dX) = u^W), 
J (.0,1] J (o,i] 

EXZ = [ EXZ X fx(dX) = [ u x (X)fi(dX) = u^X). 

J (0,1] J (0,1] 
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As a result, Z G Xx> {W) and u c (X; W) < u^(X) . Since the reverse inequality is obvious, 
we get ul{X-W) = Ufi (X). ' n 

5. Empirical estimation. In order to estimate empirically Alpha V@R contribution 
of X to W with a G N , one should first choose the number of trials K G N and generate 
independent draws (x^, Wja\ k — 1, . . . , K, I — 1, . . . , at) of (A, W) using one of procedures 
described at the end of Section EJ Having generated Xki,Wki, one should calculate the 
array 

h = argminw H , k = 1, . . . , K. 

l=l,...,a 

According to Example 15.61 (vii), an empirical estimate of p c a (X\ W) is provided by 

1 - 

fc=i 

In order to estimate Beta V@R contribution with a,j3 G N, one should generate 
Xki, Wki similarly. Let hi, ■ ■ ■ , hp be the numbers I G {1, . . . , a} such that the corre- 
sponding Wki stand at the first (3 places (in the increasing order) among Wki, ■ ■ ■ ,Wk a - 
According to Example 15.61 (vi). an empirical estimate of p(X;W) is provided by 

" k=l 1=1 

In order to estimate Weighted V@R contribution, one should fix a data set 
(xi, Wi), . . . , (xt, Wt) and a measure v on {1, . . . , T} using one of the procedures de- 
scribed at the end of Section |21 Let w^, . . . , wrr) be the values W\, . . . , wt in the in- 
creasing order (we assume that all the Wt are different). Define n(t) through the equality 
W( t ) — w n (t). According to Example 15.61 (iv), an empirical estimate of p^(X; W) is pro- 
vided by 

T rzt 
p c e (X; W) = ~Y^ Xn(t) / ip»(x)dx, 
t=i 

where z t = Yli=i u { n {^)} an d i^n * s gi ven by (|2.7j) . 



6 Factor Risk Contribution 

1. Factor risk contribution. Let (Q, J 7 , P) be a probability space and u be a coherent 
utility with the determining set D . Let Y be a random variable (resp., random vector) 
meaning the increment of some market factor (resp., factors) over the unit time period 
and W be a random variable meaning the P&L produced by some portfolio over the unit 
time period. 

Definition 6.1. The factor utility contribution is 

u fc (X;Y;W) = inf E Q X, X G L°. 

The factor risk contribution is p^ c (X; Y; W) = —u^ c (X; Y; W) . 

The function u^ c ( ■ ; Y\ W) is a coherent utility provided that Xe(v\y){W) ^ 0. 
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Proposition 6.2. If T> is ^-closed and uniformly integrable, while 
W G L l s (E(V\Y)), then X E{V \ Y) (W) ^ 0. 

Proof. By Remark (iii) following Definition \3.2\ E(T>\Y) is L 1 -closed and uniformly 
integrable. Now, the result follows from Proposition 15.21 □ 

Theorem 6.3. // X, W G L\(V) n L\(E(V\Y)) n L l , then 

u fc {X;Y;W) = u c {E{X\Y);E{W\Y)). 

Proof. By Lemma f3. 41 

EWE(Z\Y) = EE(W\Y)Z, Z eV. 

Thus, 

E{Z\Y) e X E(p]Y) {W) ^ ZeX v {E{W\Y)), 

which means that 

(W) = E(X V (E(W\Y))\Y). 
One more application of Lemma 13.41 yields 



EXE(Z\Y) = EE{X\Y)Z, ZeV. 



As a result, 



u fc {X;Y;W) = inf EXE{Z\Y) 

ZeXT,(E(W\Y)) 

inf EE(X \Y)Z 
zeXv{E{w\Y)) 

= u c {E{X\Y)-E{W\Y)). n 

Corollary 6.4. If T> is L l -closed and uniformly integrable, while 
X,W e L](V) n L](E(V\Y)) n L 1 , then 

u fc (X; Y; W) = lime" 1 ^ (W + eX; Y) - u f (W; Y)). 

ej.0 

Proof. Applying successively Theorems 16.31 1531 and 13.31 we get 



u 



fc {X;Y;W) = u c {E{X\Y);E{W\Y)) 



= lime" 1 (n(E(^|r) +eE(X\Y)) -u(E(W\Y))) 

ej,0 

= lime" 1 ^ (W + eX; Y) - u f (W; Y)) 

ej.0 

(in order to apply Theorem 15.51 in the second equality, we need to check that 
E(X|y) G L),(V) and E(VK|y) G L\(V); this is done by the same argument as in the 
proof of Lemma f3. 4J) . □ 

Remark. If (Q, J 7 , P) is atomless and u is law invariant, then, by Corollary A. 2, the 
integrability condition on X, W in the above statements can be replaced by a weaker one: 

x,WeL\(V). □ 
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Example 6.5. If If is a constant, then Xe(v\y)(W) = E(V\Y), so that 
u fc (X;Y;W) = u f (X;Y). 

(ii) If X = XW with A e R+, then u fc (X;Y;W) = Xu f (W;Y) provided that 

#E(2>[Y)(WO^0. 

(iii) Let w be law invariant and (X, Y, W) = (X, Y 1 , . . . , Y M , W) be a non-degenerate 
Gaussian random vector such that each of its components belongs to L\ (V) . Let C denote 
the covariance matrix of Y and set a m = cov(X, Y m ) , b m = cov(W, F m _)_, X = X - EX, 
Y ^ Y - EY, W_= W - EW. We have (cf. Example ESD E{X\Y) = (C-\¥), 
E(W\Y) = (C _1 6, F), so that, by Theorem IO and Example loTH (iii). 

m /c (X; y ; W) = EX + u fc (X; Y; W) 

= EX + u c {(C- 1 a,Y)- (C- l b,Y)) 

cov^c-^^Mc-^y)) 



EX — 7 ■ 
EX — 7 ■ 



(var^ft.y)) 1 ^ 
(C-\b) 



(c^b, by/ 2 ' 

In particular, if Y is one-dimensional, then 

^ C (X; y; W) = EX — 7 ^^'^ sgn cov(Py, Y). 

(var r j 1 /^ 

If moreover, cov(X, Y) > and cov(W / , Y) > 0, then, recalling Example and Exam- 
ple EHH (hi), we get 

u'(X; Y) = u%X; Y) = u^X- Y; W) = EX — 7 ^f^- n 



2. Empirical estimation. In view of Theorem 16.31 the empirical estimation of 
p fc (X\ Y; W) reduces to finding the functions f(y) = E(X|y = y), g(y) = E(W\Y = y) 
and then applying the procedures described at the end of Section El to Xki '■= f{yki), 

w k i = g{yki)- 

If y is one-dimensional, one can create the data for Y using the time change or the 
scaling procedures described at the end of Section El 



7 Optimal Risk Sharing 

1. Problem. Let (Q, J 7 , P) be a probability space and u 1 , . . . ,u M be coherent utilities 
with the determining sets D 1 , . . . ,V M . We assume that each T> m is L 1 -closed and uni- 
formly integrable. Suppose there is a firm consisting of X desks, and the n-th desk can in- 
vest into the assets that produce P&Ls (X nl , . . . , X nd ") . We assume that X nk e L\(V m ) 
for any k,n,m. The set of P&Ls that the n-th desk can produce over a unit time period 
is {(h n X n ) : h n e H n } , where X n = (X nl , . . . ,X nd ") and H n is a convex subset of R dn 
with a non-empty interior meaning the constraint on the portfolio of the n-th desk. We 
assume that u m ((h n ,X n )) < for any h n G H n \ {0}, which means that any possible 
trade has a strictly positive risk. Let E n £ M. d " be the vector of rewards for the assets 
available to the n-th desk. This is the vector of subjective assessments by the n-th desk 
of the profitability of the assets X nl , . . . , X nd,n . 
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We will consider the following optimization problem for the whole firm: 



h n G H n , n = 1, . . . ,N, (7.1) 
.P m {En( hn > Xn )) <c m , m = 1,...,M, 

where c 1 , . . . , c M G (0, oo) are fixed risk limits. This problem is a generalization of (j4.1j) . 
which might be considered as the optimization problem for a separate desk. We will 
not try to solve (|7.1j) for the following reason: if this problem admitted a solution that 
can be implemented in practice, this would mean that the central management is able 
to optimize the firm's portfolio and there would be no need for the existence of separate 
desks. Instead of trying to solve (|7.1|) . we will study the following question: Is it possible 
to decentralize (17. lj) . i.e. to create for the desks conditions such that the global optimum 
is achieved when each desk acts optimally? 

Hypothesis 1. There exist c nm such that if /i" satisfy the conditions 

1. we have 

N 

p m (j2(K,X n ))<c m , m = l,...,M, 

n=l 

and the equality is attained at least for one m; 

2. for each n, the vector h™ solves the problem 

(h n ,E n ) — > max, 

h n G H n , (7.2) 
p m ((h n , X n )) < c nm , m = 1, . . . , M, 

then (hi, ...,h*) solves (JHJ). 

This hypothesis is wrong as shown by the example below. 

Example 7.1. Let M = 1,JV = 2, ubea law invariant coherent utility that is finite 
on Gaussian random variables, H 1 = R, H 2 = M? , and (Jf 1 , Jf 21 , JT 22 ) have a jointly 
Gaussian distribution with a non-degenerate covariance matrix C . It was shown in 
Subsect. 2.2] that the solution of (j7.1j) has the form 

(hl,h?,h?)= const xC-\E\E 2 \E 22 ). 

Furthermore, the solution of ()7.2j) with n = 2 has the form 

(h?,h?)= canst xC-\E 2 \E™), 

where C is the covariance matrix of (X 21 ,X 22 ) (the constant here depends on c nm ). It 
is clear that there need not exist c nm such that (hi 1 , hi 2 ) = (hi 1 , h 22 ) . □ 

2. Limits on risk contribution. The reason why Hypothesis 1 is wrong is that in 
general 



N N 



n=l n=l 
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On the other hand, we typically have 

N N N 

P (£{h\ x n )) = Y,p c (V, x n y, J> n > x "> 

n=l n=l n=l 

(see ()5.6|) ). This gives rise to 

Hypothesis 2. Let c nm G R + be such that ^ n c nm = c m for each m. If /i™ satisfy 
the conditions 

1. we have 

v 

p m <C\ m = l,...,M, 

ra=l 

and the equality is attained at least for one m; 

2. for each n, the vector /i™ solves the problem 

(h n , E n ) — ► max, 
h n G H n , 

(p m ) c {(h n ,X n );J2 n (K,X n )) <c nm , m=l,...,M, 
then (hl,...,h?) solves (ITU . 

This hypothesis is also wrong as shown by the example below. 

Example 7.2. Let M = 1, iV = 2, u be a law invariant coherent utility, 
H 1 = H 2 = R, X X ,X 2 G Ll(V) be jointly Gaussian with a non-degenerate covariance 
matrix, and E 1 = E 2 = 1. Take arbitrary G (0, oo) such that piJ^X 1 + h 2 X 2 ) = c 

and 

EX"< 7 C0V( ^^ 1 + ^ X2) n = 12 
where 7 is provided by Example 12.111 (i). Set 

c n := p c (KX n ; hlX 1 + hlX 2 ) = -^EX" + hfr T^^t^h n = 1 ' 2 

(we used Example 15.61 (iii)). Obviously, each h™ solves (|7.2|) . On the other hand, (/ij, /i 2 ) 
need not be optimal for (|7.1|) . □ 

3. Risk trading. The reason why Hypothesis 2 is wrong is that if one unit is more 
profitable than another, but it obtains a lower risk limit, then the global optimum cannot 
be achieved. In fact, if we replace fixed c nm in Hypothesis 2 by the assumption "there 
exists c nm ..." , then (as follows from Theorem 17. 3)) the hypothesis becomes true. But this 
leaves open the problem of finding c nm . Instead of trying to solve this problem, we will 
take another path. Let us assume that the desks are allowed to trade their risk limits, i.e. 
they establish themselves (through the supply-demand equilibrium) the price a m for the 
ra-th risk limit, so that if the n-th desk buys from the n'-th desk a units of the m-th 
risk limit, then the n-th desk pays the n'-th desk the amount aa m , the n-th desk raises 
its m-th risk limit by the amount a, and the n'-th desk lowers its m-th risk limit by the 
amount a. 

Hypothesis 3. Let c nm G R+ be such that £ n c nm = c m for each m. If h% G H n , 
a nm £ anc j a m ^ ]g> + sa tisfy the conditions 
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1. we have 

N 



£>r = 0, m = l,...,M, 

n=l 

2. we have 



N 

p m (jT(K,X n ))<c m , m = l,...,M, (7.3) 

n=l 

and the equality is attained at least for one m; 

3. a™ = for all m such that inequality ()7.3j) is strict; 

4. for each n, the vectors /i™ and a" = (a™ m ; m = 1, . . . , M) solve the problem 

(/i n , E n ) - (a n , a*) — ► max, 
/i n G # n , a n G M m , 

(p m ) c ((h n ,X n );J2 n ( h *, xn )) < c nm + a nm , m = 1, . . . , M, 

then (/ij, . . . , /if ) solves <f77Tj> . 

This hypothesis is true (under minor technical assumptions) as shown by the theorem 
below. 

Theorem 7.3. Let c nm G M+ be such that ^ n c nm = c m for each m. Let belong 
to the interior of H n and assume that, for each m, the set Xj)m (Y2 n (h* > -^ n )) consists 
of a unique measure Q m . Then the following conditions are equivalent: 

(i) hl,...,h? solve JZU; 

(ii) there exist a™ m G K. and a™ G M + that satisfy the conditions of Hypothesis 3; 

(iii) there exist a™ G M + satisfying conditions 2, 3 of Hypothesis 3 and such that 

M 



= - J>™E Qm X n , n = l,...,N. 



m=l 



Moreover, the sets of possible a* in (ii) and (iii) are the same. 

Remark. The theorem shows that the system finds the optimum regardless of the 
allocation c nm of risk limits. (The resulting rewards (h™,E n ) — (a", a*) depend on c nm , 
but their sum does not depend on c nm .) It is also clear from (iii) that the equilibrium 
prices a™ of risk limits do not depend on c nm . □ 

Proof of Theorem 17 1 31 . (i)=^(iii) Set 

N 

J={m:p m (j2(K,X n ))=c m }, 

n=l 

K= {-^« m E Qm (X 1 ,...,X Ar ) :« m GM+}, 



so that K is a convex closed cone in M d , where d = d 1 + • • • + d . Sup- 
pose that E := (E l , . . . , E N ) ^ K. By the Hahn-Banach theorem, there exists 
h = (h 1 , . . . , h N ) G R d such that swp xeK (h, x) < < (h, E) . Consider h n (e) = h? + eh n . 
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There exists 5 > such that, for e G {0,5), we have: h n (e) G H n for any n and 
p m (E n (h n ,X n )) < c m for any m£J. Set 



TV 



n=l 

Then 



n=l 

TV 

g m (e) = p m (j2(h n (e),X n )), rue J. 



ds 

and, by Theorem 15.51 

TV 



' f(e) = J2(h\E n } = (h,E)>0, 



^ m (^) = — Eq™ ^(/i",X n ) < sup(/i,x) < 0, me J. 



n=l 

Due to our assumptions, g m (0) > 0. Then we can find e G (0, 5) and /3 G (0, 1) such that 
(3h n {e) G H n for each n, (3f(e) > f(0), and (3g m (e) < g m (Q) for any m E J. This means 
that 

TV TV 

^(/3^( £ ),^)>^(/i:,^) 

n=l n=l 

and 

TV 

p m (^(/3/i n (£),X n )) < c m , m=l,...,M, 

n=l 

which is a contradiction. As a result, E E K , which is the desired statement, 
(iii) =^ (ii) It follows from the inequality 

TV TV TV 

- e q™ w, xn ) = p m (Z>*, x ">) ^ ° m = E c " m ' m = x ' • • • ' M 

n=l n=l n=l 

that we can choose a™ m in such a way that a" m = for any m and 
a nm > _E Qm (/ l ",X n ) - c nm for any n,m. Then 

TV 

(p m ) c ((h:,X n );Y,(K,X n )) =-E Qm (h:,X n )<c nm + a: m , m = l,...,M. 

n=l 

Clearly, for m G J, this inequality is the equality. Assume that there exist n and 
h n G # n , a" G M m such that 



(^,E")-(a",a,)>(/i:, J E;™)-(a:,a,), 



TV 



{p m ) c ((h n ,X n );J2( h *,X n )^) < c nm + a nm , m = l,...,M. 

n=l 

This means that, for Ah n = h n — h™ and Aa n = a n — a™, we have 

{Ah n ,E n ) > (Aa n ,a*), 
E Q m (A/i n , A n ) > -Aa nm , m G J. 
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This leads to the inequality 

(Ah n , E n ) + a T E Q™ ( A/ ^ xn ) > °> 



which is a contradiction. 

(ii) =>- (i) Suppose that there exist h n e if™ such that 



N N 



n=l n=l 

N 

p m (j2(h n ,X n ))<c m , m=l,...,M. 



n=l 



Then /i n (e) := /ij + e(/i n - /£) e # n for any e E (0, 1). For Ah n = h n - h%, we have 
Y Jn (Ah n ,E n ) > 0, and, due to the convexity of the function e i-> p m (Zln(^ n ( £ )' ^™)) > 



AT , AT 



^E Qm (A/i",X"> = -_| £=o p™(^(/,«(e),X")) > 0, m e J. 



n=l " n=l 

This means that there exists n such that 

{Ah n ,E n } > -J2 a T^(^h n ,X n ). 

For this n, set a nm = a" m - E Qm (A/i n , X n ) . Then 

(h n ,E n ) — (a™, a*) > (h™,E n ) — (a", a*). 

Furthermore, 

AT 

(p m ) c X™); X™)) = -E Qm X") - E Qm (A/i", X" 

n=l N 

= ( P m ) c ((K,x n y,Y,(K,x n )) 



, nm nrn 

71=1 

< c «™ + a ™™ m = l,...,M, 



which is a contradiction. 

In order to complete the proof, it is sufficient to show that any a* satisfying (ii) also 
satisfies (iii). This is obvious. □ 



8 Summary and Conclusion 

1. Risk measurement. Consider a firm whose portfolio has the form W = Y2n=i W n . 
Here W is the P&L produced by the portfolio over the unit time period A , which is used 
as the basis for risk measurement (typically it is one day); W n is the P&L of the n-th 
asset. Let X be the P&L produced by some additional asset or portfolio over the same 
period. 

The empirical procedure to assess Alpha V@R risk of W and the risk contribution 
of X to W is: 
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1. Fix a G N. 

2. Choose a probability measure v on the set of natural numbers, choose the number 
of trials K G N, and generate independent draws (i^; k = 1, . . . , K, I = 1, . . . , a) 
from the distribution v . A natural choice for v is a geometric distribution. 

3. Calculate the array 

N 

4 = argmin^u^, k = l,...,K. 

i=i,...,a n=1 

Here w kl is the increment of the value of the n-th asset in the portfolio over the 
time period [—tki A, —(tki — 1)A] (the current time instant is 0) 1 . 

4. Calculate the empirical estimates of p a (W) and p c a (X] W) by the formulas: 

A" AT 

fc=l n=l 
1 K 

k=i 

Here x k i is the realization of X (i.e. the increment of the value of the corresponding 
portfolio) over the time period [— t kl A, —(t k i — 1)A]. 

Note that the same arrays (tki) and (l k ) are used for different X . If we measure 
risk on the daily basis, steps 1-3 can be performed only once a day (for example, in 
the night). Thus, in the morning the central desk announces the arrays (tki) and (lk), 
and when assessing the risk contribution of any trade X , any desk should simply take 
the realizations of X over the corresponding intervals and insert them into the formula 
for Pg. The number of operations required to generate (tki) an d (lk) is of order aNK; 
the number of operations required to calculate p c c is of order K . In particular, we need 
not order the data set (which is needed for estimating V@R). 

The above estimation procedure is completely non-linear. Moreover, it is completely 
empirical as it uses no model assumptions on the structure of X and W . 

If X = J2j=i X j is itself a big portfolio (for example, the P&L produced by a desk 
of the firm), then both the theoretical risk contribution and its empirical estimate satisfy 
the linearity property: 

J J 

j=l 3=1 

The above procedure can be combined with the bootstrap technique, with obvious 
changes. 

A similar procedure can be performed for Beta V@R. The difference is that one 
should additionally choose (5 G {1, . . . , a — 1} and find the numbers Iki, ■ ■ ■ , hp such 
that the corresponding J2 n w kl stand at the first (3 places (in the increasing order) among 
Ylin w kn ■ ■ ■ > E n w L- Then the empirical estimates of p a ,p(W) and p c a ^(X; W) are pro- 

1 Of course, if the price of the n-th asset is positive by its nature (for example, the n-th asset is a 
share), then one can use a finer procedure to determine , i.e. is the current value of the asset 
times its relative increment over the period [— tki A, — (tki — 1)A] . 
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vided by 

K (3 N 



KB 

^ k=l i=l n=l 



K(3 
r k=i i=i 

2. Factor risk measurement. The procedure to assess Alpha V@R factor risks 
of W and the factor risk contributions of X to W is: 

1. Fix a G N. 

2. Choose the main market factors Y 1 , . . . , Y M affecting the risk of the portfolio. Here 
Y m means the increment of the m-th factor over the unit time period. 

3. Create procedures for calculating the functions f m (y) = E(X\Y m = y) and 
g nm (y) = E(W n \Y m = y). 

4. Choose a probability measure v on the set of natural numbers, choose the number 
of trials K , and generate independent draws (tki] k — 1, . . . , K, I — 1, . . . , a) from 
the distribution v . 

5. Calculate the array 

N 

/r = argminV^ m «), k = 1, . . . , K, m = 1, . . . , M. 

J=l,..,a ^ 

Here y™ t is the increment of the m-th factor over the time period 
[-t kl (a rn ) 2 A,-(t kl - l)(fr m ) 2 A], where a m is the current volatility of the m-th 
factor (for example, the implied volatility). Instead of this time change procedure, 
one can use the scaling procedure. 

6. Calculate the empirical estimates of p^iW; Y m ) and p£ c (X; Y m ; W) by the formu- 
las: 

K N 

pf(W; r) = T EE 9 nm {yTi r ) , m = 1, . . . , M, 

k = l 71=1 

K 



\X; Y m ; W) = -±J2 r(y^ rn = l,...,M. 

k=l 

All the pleasant features of risk estimates described above remain true for factor risks. 
Moreover, an important advantage of factor risks is that we can take an arbitrarily large 
data set yi, . . . ,yr, while the joint data set vo™ required for ordinary risk does not exist 
for large portfolios. 

A similar procedure can be performed for Beta V@R. The difference is that one 
should additionally choose f3 G {1, . . . , a — 1} and find the numbers . . . , 1™$ such 
that the corresponding ^2 n g nm {yki) stand at the first f3 places (in the increasing order) 
among J2 n g nm (yki), • • • , J2 n 9 nm (Uka) ■ Then the empirical estimates of p^ a p(W; Y m ) and 
p{ C ; p(X; Y m ; W) are provided by 

K (3 N 

p f(W; n = "^EEE 9 nm (y^) ' rn=l,...,M, 

' k=l i=l n=l 
, K /3 

p fc {X . ym. W) = -—J2J2 / m (^)' m = ■ ■ ■ ' ^ 

' k=l i=l 
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The functions g nm should not be recalculated every day (if we measure risk on a 
daily basis); once computed, these functions might be used for rather a long period. Of 
course, the collection of assets in the portfolio changes each day, but the main part of this 
collection remains the same, so that when the (iV+l)-th asset appears in the portfolio, we 
should calculate g N+1 > m ; m = 1, . . . , M , but we should not recalculate g nm , n = 1, . . . , N, 
m = l,...,M. 

The factor risk estimation admits a multi-factor version: instead of considering 
M different factor risks, we consider one risk driven by the multidimensional factor 
Y = (Y 1 , . . . ,Y ) . Alternatively, we could split the factors into M groups so that 
the increment of the m-th group is a random vector, which we still denote by Y m , and 
split the portfolio into M groups so that the risk of the m-th group is driven mainly by 
the m-th group of factors. Then when dealing with the m-th factor risk we are consid- 
ering only the m-th part of the portfolio. All the formulas remain the same with obvious 
changes. 

3. Comparison of various techniques. In Table 1, we compare different risk 
measurement techniques proposed in this paper and several classical risk measurement 
techniques (see |H2 Sect. 6] for their description). Let us briefly describe this table. 

All the techniques, except for one- factor risk measurement, assess the overall risk. As 
shown by Example the sum of one-factor risks need not exceed the multi-factor risk, 
so that the measurement of one-factor risks might be insufficient. 

For the empirical risk estimation procedure described above, one requires the joint 
time series for all the assets in the portfolio. However, different assets have different 
durations, so that a joint time series might exist only for small portfolios or portfolios 
consisting of long-living assets only. In contrast, all the other methods express the values 
of the assets through the main factors, and for the factors one can get arbitrarily large 
time series. 

For one-factor risks we can use the time change technique described above, which 
enables one to react immediately to the volatility changes. For parametric V@R and 
Monte Carlo V@R, this can partially be done, but these methods require the covariance 
matrix for different assets, for which we should use historic data. 

The speed of computations for one-factor coherent risks, multi-factor coherent risks, 
and historic V@R is approximately the same because in all these methods the main part 
of computations consists in finding the values g nm (yki)- 

The methods proposed in this paper deal with coherent risk. The arguments of Sec- 
tion El show that in many respects this is much wiser than the use of V@R. 

All the methods of this paper admit simple calculation of risk contributions. For 
parametric V@R, this is also possible because there exist explicit formulas for Gaussian 
V@R contributions. For Monte Carlo V@R, the possibility to calculate risk contributions 
depends on the choice of the probabilistic model. 

All the methods, except for parametric V@R, are completely non-linear. 

Monte Carlo V@R might be performed both for Gaussian and non-Gaussian models, 
but for the latter ones this leads to a further decrease in the speed of computations. 

The empirical risk estimation technique uses absolutely no model assumptions. As for 
the factor risks and historic V@R, there are no assumptions on the probabilistic structure 
of Y m , but model assumptions are used when calculating the functions f m , g nm . 

As the conclusion, the best methods are: one-factor and multi-factor coherent risk 
measurement. It might be reasonable to use both of them simultaneously. 
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Table 1. Comparison of various risk measurement techniques. The 
relative strengths of different methods are shown by the extent of shading. 



4. Risk management. Two practical recommendations of this paper concerning 
risk management follows: 

1. The central management of a firm should impose the limits not on the outstanding 
risks (resp., factor risks) of the desks' portfolios, but rather on their risk contribu- 
tions (resp., factor risk contributions) to the capital of the whole firm. In view of 
the formulas given above, this can be done simply by announcing the arrays (tki) 
and (l k ) (resp., (t kl ) and (/£*)). 

2. If the desks are allowed to trade these risk limits within the firm, then the corre- 
sponding competitive optimum is in fact the global optimum for the whole firm, 
regardless of the initial allocation of the risk limits. 

Appendix: L° Version of the Kusuoka Theorem 

Theorem A.l. Let (Q, J 7 , P) be atomless and u be a coherent utility on L° with the 
determining set V . The following conditions are equivalent: 

(i) u is law invariant; 

(ii) T> is law invariant [i.e. if Z ET> and Z' ^= Z , then Z' G T>); 

(iii) there exists a set DJl of probability measures on (0,1] such that 

u(X) = inf u„(X), X G L°; 

(iv) there exists a set DJl of probability measures on (0, 1] such that V = LLeajr^V 

Proof. In essence, the reasoning will follow the lines of the proof of the L°° -version 
of the theorem borrowed from [2U Sect. 4.5]. 
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(i) =>- (ii) As (Jl, P) is atomless, it supports a random variable X with any given 
distribution Q on the real line. Then the equality u(Q) := u(X) correctly defines a 
function on the set of distributions. By the definition, 

V = {Z : Z > 0, EZ=1, and EXZ > u(X) VX G L } 

= p|{Z : Z > 0, EZ = 1, and EXZ > S(Q) VX with LawX = Q}, 
Q 

where the intersection is taken over all the probability measures on R. By the Hardy- 
Littlewood inequality (see |Z2 Th. A.24]), 

EX+Z> I q s {X + ) qi _ s (Z)ds = I q s (X) qi _ s (Z)ds, 
Jo Jf(o) 

EX-Z < / q s (X-)q s (Z)ds = / q^ s (X-)q^ s (Z)ds = - / q s (X)q^ s (Z)ds, 
Jo Jo Jo 

where F is the distribution function of X. Consequently, 

EXZ> ! q s (X) qi _ s (Z)ds. (a.l) 
Jo 

Fix Z and a probability measure Q on R. As (fl, JF, P) is atomless, Z can be represented 
as qu{Z) with a uniformly distributed on [0, 1] random variable U . Then X := g 1 _ C /(Q) 
has the distribution Q (<7a(Q) is the A-quantile of Q). Obviously, 

EXZ = I q s {Q)q^ s {Z)ds. 
Jo 

Combining this with (a.l), we get 

inf EZX = / q s {Q)q^ s {Z)ds. 

X:Law X=Q Jo 

Thus, the set {Z : EZX > u(Q) VX with LawX = Q} is law invariant. Hence, T> is law 
invariant. 

(ii) =^(iii) The law invariance of T> means that there exists a set of probability 
measures on R + such that V = {Z : LawZ G 0} . Fix Q G £3. There exists a unique 
measure /i = /i(Q) on (0, 1] such that ip^x) = qi- x (Q) , x G (0, 1] ( ip ^ is given by (|2.7|l ). 
Clearly, /x is positive and 

!]) = / / y~ l dxn{dy) = / y~ l n(dy)dx 

7(0,1] 7(0,y] JO J[a:,l] 

= / qi- x (Q)dx = / xQ(dx) = 1. 

JO JR+ 

Applying the same argument as above and recalling (j2.fi)) . we get 

inf EXZ = / q s (X) qi . s (Q)ds = [ q a {X)%{X)da = u ll {X), X G L°. (a.2) 

Z:LawZ=Q Jo JO 



As a result, 



w(X) = inf EXZ = inf inf EXZ = inf m m(q) (X), X G L°. (a.3) 

Ze£> QeQZ:LawZ=Q Qe£J 
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(iii) =>- (i) This implication follows from the law invariance of . 

(ii)=^(iv) It follows from (a.2) that {Z : Law Z = Q} C £> M (Q). Hence, 
V C UoeQ^MQ)- On ^ ne other hand, if Z G ^(q) with some Q G 15, then, by (a.3), 

EXZ>u m {X) >u(X), XeL°, 

so that, by definition, Z eV. Hence, V = Uoeo ^V(Q) • 

(iv) =^(ii) This implication follows from the law invariance of X> M , which is seen 
from (gUP - □ 

Corollary A.2. Zet (fi,^ 7 , P) be atomless, u be a law invariant coherent utility with 
the determining set V , and Y be a random vector. Then E(V\Y) C V . 

Proof. By Theorem A.l, V = U^ean^V with some set 9Jt of probability measures on 
(0, 1]. Using representation ()2.12|) . the convexity of the function ^ given by f)2. 13j) . and 
the Jensen inequality, we get E(D M |F) C V^, which yields the desired statement. □ 

The following example shows that the law invariance condition in the preceding corol- 
lary is essential. 

Example A.3. Let T> = {Z} , where Z / 1. Let Y be independent of dQ/dP. Then 
E(V\Y) = {1} % V. □ 
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